Re: [xsl] Question on duplicate node elimination

Subject: Re: [xsl] Question on duplicate node elimination
From: Lars Huttar <lars_huttar@xxxxxxx>
Date: Mon, 23 Aug 2010 15:54:05 -0500
 On 8/22/2010 5:12 PM, Hermann Stamm-Wilbrandt wrote:
>> I'm not sure what you find surprising about the results you are seeing.
>> What results would you expect?
> Not surprising.
>
> But how could the algorithm step of "duplicate elimination" be done?
> How can the duplicates be determined and removed, correctly?
>

If I'm understanding your question correctly (are you trying to
implement an XPath processor in XSLT 1.0?) I think it's impossible, if
you create the rtf simply using xsl:copy-of. Because as Mike said, once
you've copied nodes, the copies are distinct; there's no information in
the rtf(s) to distinguish copies of the same node from copies of
identical twins.

Could you create the rtf using a "special" attribute that preserves the
id of the node which you are copying? E.g.

	  <xsl:attribute name="originalID" namespace="http://hsw.org/specialNamespaceURI";>
	    <xsl:value-of select="generate-id()" />
          </xsl:attribute>

Then you could use that originalID attribute to determine what nodes were identical in the original, and strip out the originalID attribute after using it.

But I guess this would only work on elements, since only elements can have attributes...

Lars



> Perhaps I was not clear enough with my question.
> How can this step (p. 40 from [1]) be implemented in XPath 1.0 plus
> eslt:node-set():
> A location step identifies a new mode-set relative to the context node-set.
> The location step is evaluated against each node in the context node-set,
> and the union of the resulting node-sets becomes the context node-set for
> the next step. Location steps consist of an axis identifier, a node test
> and zero or more predicates (see Figure 3-4). ...
>
>
> [1]
> http://www.theserverside.net/tt/books/addisonwesley/EssentialXML/index.tss
>
> Mit besten Gruessen / Best wishes,
>
> Hermann Stamm-Wilbrandt
> Developer, XML Compiler, L3
> WebSphere DataPower SOA Appliances
> ----------------------------------------------------------------------
> IBM Deutschland Research & Development GmbH
> Vorsitzender des Aufsichtsrats: Martin Jetter
> Geschaeftsfuehrung: Dirk Wittkopp
> Sitz der Gesellschaft: Boeblingen
> Registergericht: Amtsgericht Stuttgart, HRB 243294
>
>
>
> From:       Michael Kay <mike@xxxxxxxxxxxx>
> To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Date:       08/22/2010 11:53 PM
> Subject:    Re: [xsl] Question on duplicate node elimination
>
>
>
> I'm not sure what you find surprising about the results you are seeing.
> What results would you expect?
>
> xsl:copy-of creates a new node. Copying the same node twice creates two
> copies with distinct identity. Is that the issue?
>
> Michael Kay
> Saxonica
>
> On 22/08/2010 22:25, Hermann Stamm-Wilbrandt wrote:
>> Hello,
>>
>> I have a question on duplicate node elimination.
>>
>>> From the XPATH 1.0 specification:
>> ...
>> * node-set (an unordered collection of nodes without duplicates)
>> ...
>> An initial sequence of steps is composed together with a following step
> as
>> follows. The initial sequence of steps selects a set of nodes relative to
> a
>> context node. Each node in that set is used as a context node for the
>> following step. The sets of nodes identified by that step are unioned
>> together. The set of nodes identified by the composition of the steps is
>> this union.
>> ...
>>
>> So "are unioned together" results in a node-set and that does not contain
>> duplicates.
>>
>> Now how can this algorithm step be realized in XPATH 1.0 plus
>> exslt:node-set
>> funtion?
>> (this would work in browsers with the technique from David Carlisle [1])
>>
>>
>> This is the output for below stylesheet simple.xsl on file simple.xml.
>> For the nodes four node /a/b/c their parents are copied into an
>> intermediate
>> result. But xsltproc and xalan show that the four nodes are different by
>> the
>> their generate-id() values, whereas the first pair and last pair are
>> representations of the same node.
>>
>> xsltproc        xalan
>> 1: id2659470    1: AbT0
>> 2: id2659470    2: AbT0
>> 3: id2659354    3: AbT1
>> 4: id2659354    4: AbT1
>>
>> 1: id2659234    1: AbT2
>> 2: id2659244    2: AbT3
>> 3: id2659254    3: AbT4
>> 4: id2659264    4: AbT5
>>
>> 1:<b>           1:<b>
>>      <c>1</c>         <c>1</c>
>>      <c>2</c>         <c>2</c>
>>    </b>             </b>
>> 2:<b>           2:<b>
>>      <c>1</c>         <c>1</c>
>>      <c>2</c>         <c>2</c>
>>    </b>             </b>
>> 3:<b>           3:<b>
>>      <c>1</c>         <c>1</c>
>>      <c>2</c>         <c>2</c>
>>    </b>             </b>
>> 4:<b>           4:<b>
>>      <c>1</c>         <c>1</c>
>>      <c>2</c>         <c>2</c>
>>    </b>             </b>
>>
>>
>>
>> $ cat simple.xml
>> <a>
>>    <b>
>>      <c>1</c>
>>      <c>2</c>
>>    </b>
>>    <b>
>>      <c>1</c>
>>      <c>2</c>
>>    </b>
>> </a>
>> $ cat simple.xsl
>> <xsl:stylesheet version="1.0"
>>    xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>>    xmlns:exsl="http://exslt.org/common";
>>
>>    <xsl:output omit-xml-declaration="yes"/>
>>
>>    <xsl:template match="/">
>>      <xsl:variable name="rtf">
>>        <xsl:for-each select="/a/b/c">
>>          <xsl:copy-of select=".."/>
>>        </xsl:for-each>
>>      </xsl:variable>
>>
>>      <xsl:for-each select="/a/b/c">
>>        <xsl:value-of select="position()"/><xsl:text>:</xsl:text>
>>        <xsl:value-of select="generate-id(..)"/><xsl:text>&#10;</xsl:text>
>>      </xsl:for-each>
>>
>>      <xsl:text>&#10;</xsl:text>
>>
>>      <xsl:for-each select="exsl:node-set($rtf)/*">
>>        <xsl:value-of select="position()"/><xsl:text>:</xsl:text>
>>        <xsl:value-of select="generate-id(.)"/><xsl:text>&#10;</xsl:text>
>>      </xsl:for-each>
>>
>>      <xsl:text>&#10;</xsl:text>
>>
>>      <xsl:for-each select="exsl:node-set($rtf)/*">
>>        <xsl:value-of select="position()"/><xsl:text>:</xsl:text>
>>        <xsl:copy-of select="."/><xsl:text>&#10;</xsl:text>
>>      </xsl:for-each>
>>    </xsl:template>
>>
>> </xsl:stylesheet>
>> $
>>
>>
>> [1] http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html
>>
>>
>> Mit besten Gruessen / Best wishes,
>>
>> Hermann Stamm-Wilbrandt
>> Developer, XML Compiler, L3
>> WebSphere DataPower SOA Appliances
>> ----------------------------------------------------------------------
>> IBM Deutschland Research&  Development GmbH
>> Vorsitzender des Aufsichtsrats: Martin Jetter
>> Geschaeftsfuehrung: Dirk Wittkopp
>> Sitz der Gesellschaft: Boeblingen
>> Registergericht: Amtsgericht Stuttgart, HRB 243294
>
> X-Quarantine ID  /var/spool/MD-Quarantine/18/qdir-2010-08-22-18.13.01-001

Current Thread