Re: Fw: [xsl] Question on duplicate node elimination

Subject: Re: Fw: [xsl] Question on duplicate node elimination
From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx>
Date: Fri, 27 Aug 2010 14:59:56 +0200
Michael,

> ... Instead, whenever you
> are evaluating an operation that returns a node-set, represent that
> node-set as a string containing the generate-id values of the nodes in
> the node-set, space-separated. Elimination of duplicates then reduces to
> an operation on strings: not trivial, but not especially difficult
> either.

yesterdays solution [1] based on id() function was working good.


But I thought again and below single file solution based on applying
key() function twice for duplicate elimination is much better:
* does not need any separately created structure (like idcopy in [1])
* is really short, just a few lines (not counting comments)
* works on ALL major browsers (IE support by David Carlisle's trick [4])

Below are
* execution by xsltproc
* listing of dupelinm3.xsl [2]
* listing of ancestor.xml [3] (open that in browser)


$ xsltproc dupelim3.xsl ancestor3.xml
<html><pre><h2>Duplicate node elimination by applying key() function
twice</h2>
    See <a href="dupelim3.xsl">dupelim3.xsl</a> for details.
    Tested to work with these browsers:
      Chrome
      Firefox
      Internet Explorer
      Opera
      Safari
    (clicking reload shows different ids)


ids(//*)
a      id2619817
+-b    id2619788
! +-c  id2619830
! +-c  id2619802
+-b    id2619245
! +-c  id2619317
! +-c  id2619321
<hr>
ids(//c):
<id>id2619830</id><id>id2619802</id><id>id2619317</id><id>id2619321</id>
<hr>
nodes="ids(//c)"<br>ids($nodes/ancestor::*):
<id>id2619817</id><id>id2619788</id><id>id2619245</id>
</pre></html>
$
$ cat dupelim3.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:exslt="http://exslt.org/common";
  xmlns:msxsl="urn:schemas-microsoft-com:xslt"
  exclude-result-prefixes="exslt msxsl"
>
  <xsl:output method="html"/>

  <xsl:key name="nodes-by-genid" match="node()" use="generate-id()"/>


  <xsl:template match="/">

    <!--
         initial node-set sample, represented by <id> nodes
    -->
    <xsl:variable name="nodes">
      <xsl:for-each select="//c">
        <id><xsl:value-of select="generate-id()"/></id>
      </xsl:for-each>
    </xsl:variable>


    <!--
         do ancestor location step
    -->
    <xsl:variable name="result">
      <!--
           application of "ancestor::*" on $nodes;
           $aux might contain duplicate id nodes
      -->
      <xsl:variable name="aux">
        <!--
             use key() function to determine real nodes
        -->
        <xsl:for-each select="key('nodes-by-genid',exslt:node-set
($nodes)/id)">
          <!--
              location step on each real node
          -->
          <xsl:for-each select="ancestor::*">
            <!--
                generate <id>s for new nodes
            -->
            <id><xsl:value-of select="generate-id()"/></id>
          </xsl:for-each>
        </xsl:for-each>
      </xsl:variable>

      <!--
           use key() function for duplicate elimination
      -->
      <xsl:for-each select="key('nodes-by-genid',exslt:node-set($aux)/id)">
        <!--
            generate <id>s, now for unique new nodes
        -->
        <id><xsl:value-of select="generate-id()"/></id>
      </xsl:for-each>
    </xsl:variable>


<html><pre>
    <h2>Duplicate node elimination by applying key() function twice</h2>
    See <a href="dupelim3.xsl">dupelim3.xsl</a> for details.
    Tested to work with these browsers:
      Chrome
      Firefox
      Internet Explorer
      Opera
      Safari
    (clicking reload shows different ids)

    <!-- node name vs genid output -->
    <xsl:text>&#10;ids(//*)</xsl:text>
    <xsl:for-each select="//*">
      <xsl:value-of select=
        "concat('&#10;',substring('! +-',5-2*count(ancestor::*)),name(),
         substring('    ',1+2*count(ancestor::*)),'  ',generate-id())"/>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text><hr/><xsl:text>&#10;</xsl:text>

    <!-- for verification -->
    <xsl:text>ids(//c): </xsl:text>
    <xsl:copy-of select="$nodes"/>
    <xsl:text>&#10;</xsl:text><hr/><xsl:text>&#10;</xsl:text>

    <!-- output of result -->
    <xsl:text>nodes="ids(//c)"</xsl:text><br/>
    <xsl:text>ids($nodes/ancestor::*): </xsl:text>
    <xsl:copy-of select="$result"/>

    <xsl:text>&#10;</xsl:text>
</pre></html>

  </xsl:template>


<!--
  from http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html
-->
<msxsl:script language="JScript" implements-prefix="exslt">
 this['node-set'] =  function (x) {
  return x;
  }
</msxsl:script>

</xsl:stylesheet>
$
$ cat ancestor3.xml
<?xml-stylesheet href="dupelim3.xsl" type="text/xsl"?>
<a>
  <b>
    <c>1</c>
    <c>2</c>
  </b>
  <b>
    <c>3</c>
    <c>4</c>
  </b>
</a>

$


[1]
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201008/msg00291.html
[2] http://stamm-wilbrandt.de/en/xsl-list/ancestor3.xml
[3] http://stamm-wilbrandt.de/en/xsl-list/dupelim3.xml
[4] http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike@xxxxxxxxxxxx>
To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:       08/24/2010 02:17 PM
Subject:    Re: Fw: [xsl] Question on duplicate node elimination



  I haven't understood your logic in any detail, but I wonder if it
suggests an alternative approach to the problem: namely, avoid creating
RTFs entirely, at least for intermediate results. Instead, whenever you
are evaluating an operation that returns a node-set, represent that
node-set as a string containing the generate-id values of the nodes in
the node-set, space-separated. Elimination of duplicates then reduces to
an operation on strings: not trivial, but not especially difficult either.

Michael Kay
Saxonica

Current Thread