Re: Fw: [xsl] Question on duplicate node elimination

This is a question on "pointers" in XSLT.


The sample ancestor3.xml [1] is demonstration for nodes "//*" and
"ids($nodes/ancestor::*)". This excludes the root node.

ancestor4.xml [2] demonstrates "ids($nodes/ancestor::node())" and
nodes "/|//*" (includes root node).


This is the modified key definition needed by dupelim4.xsl [3]:
  <xsl:key name="nodes-by-genid" match="/" use="generate-id()"/>
  <xsl:key name="nodes-by-genid" match="node()" use="generate-id()"/>

By this definition every of the seven node types in the XML data
model [4] is covered.


Having an id-node-set of the form
  <id>some_id_1</id>
  <id>some_id_2</id>
  ...
  <id>some_id_k</id>

as in [3] allows to (efficiently) "address" the represented nodes in the
XML tree by the key() function.
And every node-set can be represented by such an id-node-set.

Result tree fragments of id-node-sets can be converted to id-node-sets
by the exslt:node-set() function as in [3].
This allows for iteratively generating new id-node-sets.


I did a quick search for "XSLT pointer" and found hits for pointers in
C-implementations of XSLT processors or for "XPointer".


Can representing the current node by
  <id><xsl:value-of select="generate-id()"/></id>

in conjuntion with "bulk" conversion to corresponding (real) node-set by
  "key('nodes-by-genid',exslt:node-set($nodes)/id)"

for id-node-set $nodes be considered as XSLT "pointer" representation of
the current node as in C?


[1] http://stamm-wilbrandt.de/en/xsl-list/ancestor3.xml
[2] http://stamm-wilbrandt.de/en/xsl-list/ancestor4.xml
[3] http://stamm-wilbrandt.de/en/xsl-list/dupelim4.xsl
[4] http://www.w3.org/TR/xpath/#data-model


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Hermann Stamm-Wilbrandt/Germany/IBM@IBMDE
To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:       08/27/2010 03:00 PM
Subject:    Re: Fw: [xsl] Question on duplicate node elimination



Michael,

> ... Instead, whenever you
> are evaluating an operation that returns a node-set, represent that
> node-set as a string containing the generate-id values of the nodes in
> the node-set, space-separated. Elimination of duplicates then reduces to
> an operation on strings: not trivial, but not especially difficult
> either.

yesterdays solution [1] based on id() function was working good.


But I thought again and below single file solution based on applying
key() function twice for duplicate elimination is much better:
* does not need any separately created structure (like idcopy in [1])
* is really short, just a few lines (not counting comments)
* works on ALL major browsers (IE support by David Carlisle's trick [4])

Below are
* execution by xsltproc
* listing of dupelinm3.xsl [2]
* listing of ancestor.xml [3] (open that in browser)


$ xsltproc dupelim3.xsl ancestor3.xml
<html><pre><h2>Duplicate node elimination by applying key() function
twice</h2>
    See <a href="dupelim3.xsl">dupelim3.xsl</a> for details.
    Tested to work with these browsers:
      Chrome
      Firefox
      Internet Explorer
      Opera
      Safari
    (clicking reload shows different ids)


ids(//*)
a      id2619817
+-b    id2619788
! +-c  id2619830
! +-c  id2619802
+-b    id2619245
! +-c  id2619317
! +-c  id2619321
<hr>
ids(//c):
<id>id2619830</id><id>id2619802</id><id>id2619317</id><id>id2619321</id>
<hr>
nodes="ids(//c)"<br>ids($nodes/ancestor::*):
<id>id2619817</id><id>id2619788</id><id>id2619245</id>
</pre></html>
$
$ cat dupelim3.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:exslt="http://exslt.org/common";
  xmlns:msxsl="urn:schemas-microsoft-com:xslt"
  exclude-result-prefixes="exslt msxsl"
>
  <xsl:output method="html"/>

  <xsl:key name="nodes-by-genid" match="node()" use="generate-id()"/>


  <xsl:template match="/">

    <!--
         initial node-set sample, represented by <id> nodes
    -->
    <xsl:variable name="nodes">
      <xsl:for-each select="//c">
        <id><xsl:value-of select="generate-id()"/></id>
      </xsl:for-each>
    </xsl:variable>


    <!--
         do ancestor location step
    -->
    <xsl:variable name="result">
      <!--
           application of "ancestor::*" on $nodes;
           $aux
might contain duplicate id nodes
      -->
      <xsl:variable name="aux">
        <!--
             use key() function to determine real nodes
-->
        <xsl:for-each select="key('nodes-by-genid',exslt:node-set
($nodes)/id)">
          <!--
              location step on each real node
          -->
          <xsl:for-each select="ancestor::*">
            <!--
                generate <id>s for new nodes
            -->
            <id><xsl:value-of select="generate-id()"/></id>
          </xsl:for-each>
        </xsl:for-each>
      </xsl:variable>

      <!--
           use key() function for duplicate elimination
      -->
      <xsl:for-each select="key('nodes-by-genid',exslt:node-set($aux)/id)">
        <!--
            generate <id>s, now for unique new nodes
        -->
        <id><xsl:value-of select="generate-id()"/></id>
      </xsl:for-each>
    </xsl:variable>


<html><pre>
    <h2>Duplicate node elimination by applying key() function twice</h2>
    See <a href="dupelim3.xsl">dupelim3.xsl</a> for details.
    Tested to work with these browsers:
      Chrome
      Firefox
      Internet Explorer
      Opera
      Safari
    (clicking reload shows different ids)

    <!-- node name vs genid output -->
    <xsl:text>&#10;ids(//*)</xsl:text>
    <xsl:for-each select="//*">
      <xsl:value-of select=
        "concat('&#10;',substring('! +-',5-2*count(ancestor::*)),name(),
         substring('    ',1+2*count(ancestor::*)),'  ',generate-id())"/>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text><hr/><xsl:text>&#10;</xsl:text>

    <!-- for verification -->
    <xsl:text>ids(//c): </xsl:text>
    <xsl:copy-of select="$nodes"/>
    <xsl:text>&#10;</xsl:text><hr/><xsl:text>&#10;</xsl:text>

    <!-- output of result -->
    <xsl:text>nodes="ids(//c)"</xsl:text><br/>
    <xsl:text>ids($nodes/ancestor::*): </xsl:text>
    <xsl:copy-of select="$result"/>

    <xsl:text>&#10;</xsl:text>
</pre></html>

  </xsl:template>


<!--
  from
http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html
-->
<msxsl:script language="JScript" implements-prefix="exslt">
 this['node-set'] =  function (x) {
  return x;
  }
</msxsl:script>

</xsl:stylesheet>
$
$ cat ancestor3.xml
<?xml-stylesheet href="dupelim3.xsl" type="text/xsl"?>
<a>
  <b>
    <c>1</c>
    <c>2</c>
  </b>
  <b>
    <c>3</c>
    <c>4</c>
  </b>
</a>

$


[1]
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201008/msg00291.html

[2] http://stamm-wilbrandt.de/en/xsl-list/ancestor3.xml
[3] http://stamm-wilbrandt.de/en/xsl-list/dupelim3.xml
[4] http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike@xxxxxxxxxxxx>
To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:       08/24/2010 02:17 PM
Subject:    Re: Fw: [xsl] Question on duplicate node elimination



  I haven't understood your logic in any detail, but I wonder if it
suggests an alternative approach to the problem: namely, avoid creating
RTFs entirely, at least for intermediate results. Instead, whenever you
are evaluating an operation that returns a node-set, represent that
node-set as a string containing the generate-id values of the nodes in
the node-set, space-separated. Elimination of duplicates then reduces to
an operation on strings: not trivial, but not especially difficult either.

Michael Kay
Saxonica
<- Previous	Index	Next ->
Re: Fw: [xsl] Question on duplicate, Hermann Stamm-Wilbra	Thread	[no subject], Unknown
Re: [xsl] questions on "number('+5', Hermann Stamm-Wilbra	Date	[xsl] How to output the unused name, Costello, Roger L.
	Month
<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home