Re: Fw: [xsl] Question on duplicate node elimination

Subject: Re: Fw: [xsl] Question on duplicate node elimination
From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx>
Date: Tue, 24 Aug 2010 20:52:20 +0200
Michael,

>   I haven't understood your logic in any detail, but I wonder if it
> suggests an alternative approach to the problem: namely, avoid creating
> RTFs entirely, at least for intermediate results. Instead, whenever you
> are evaluating an operation that returns a node-set, represent that
> node-set as a string containing the generate-id values of the nodes in
> the node-set, space-separated. Elimination of duplicates then reduces to
> an operation on strings: not trivial, but not especially difficult
either.

that is a cool idea.

And reading your suggestion of white space separated list of stings I
thought on the id() function.

This function can do the duplicate elimination "for free" !

Having a document with DOCTYPE/ID and a white space separated string of
id's the call of id() with that string does not only return all the nodes
with the given id's -- it also does the duplicate node elimination ...


I figured out how to create the DOCTYPE definition while creating output
by xsl:text. Generating such an output XML file works perfect as can be
seen in the demo idc.xsl [1] and below.
File idc2.xml is the output generated by calling template idcopy for file
simple2.xml.

The big question now is, whether exslt:node-set() supports DOCTYPE
definitions and how.  idc.xsl shows an attempt which does not work.
Accessing an element by its id works for document('idc2.xml') but
does not work for document(exslt:node-set($rtf)) although both are
generated identically by a call to template idcopy.
The difference seem to be the parsing from file idc2.xml ...


Is DOCTYPE supported by exslt:node-set()?
Is the generation of DOCTYPE by <xsl:text> OK for this purpose?
Can using id() function be made working for duplicate elimination
somehow differently?


$ xsltproc idc.xsl simple2.xml

----------
<node id="id2335172" type="text" value="4"/>
$ cat simple2.xml
<a>
  <b>
    <c>1</c>
    <c>2</c>
  </b>
  <b>
    <c>3</c>
    <c>4</c>
  </b>
</a>

$ cat idc2.xml

<!DOCTYPE node [ <!ATTLIST node id ID #REQUIRED> ]>
<node id="id2335401" type="element" name="a"><node id="id2335402"
type="text" value="&#10;  "/><node id="id2335404" type="element"
name="b"><node id="id2335405" type="text" value="&#10;    "/><node
id="id2335406" type="element" name="c"><node id="id2335407" type="text"
value="1"/></node><node id="id2335408" type="text" value="&#10;    "/><node
id="id2335409" type="element" name="c"><node id="id2335162" type="text"
value="2"/></node><node id="id2335163" type="text" value="&#10;  "/>
</node><node id="id2335164" type="text" value="&#10;  "/><node
id="id2335165" type="element" name="b"><node id="id2335166" type="text"
value="&#10;    "/><node id="id2335167" type="element" name="c"><node
id="id2335168" type="text" value="3"/></node><node id="id2335169"
type="text" value="&#10;    "/><node id="id2335170" type="element"
name="c"><node id="id2335172" type="text" value="4"/></node><node
id="id2335173" type="text" value="&#10;  "/></node><node id="id2335174"
type="text" value="&#10;"/></node>
$
$ cat idc.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:exslt="http://exslt.org/common";
  exclude-result-prefixes="exslt"
>
  <xsl:output omit-xml-declaration="yes"/>

  <xsl:key name="nodes-by-id" match="node()" use="@id"/>

  <xsl:template match="/">
    <xsl:variable name="rtf">
      <xsl:call-template name="idcopy"/>
    </xsl:variable>

    <xsl:variable name="id1" select=
      "string(exslt:node-set($rtf)//node[@type='text'][@value='4']/@id)"/>

    <xsl:for-each select="document(exslt:node-set($rtf))">
      <xsl:copy-of select="id($id1)"/>
    </xsl:for-each>

<xsl:text>&#10;----------&#10;</xsl:text>

    <xsl:variable name="id2" select=
      "string(document('idc2.xml')//node[@type='text'][@value='4']/@id)"/>

    <xsl:for-each select="document('idc2.xml')">
      <xsl:copy-of select="id($id2)"/>
    </xsl:for-each>
  </xsl:template>



  <xsl:template name="idcopy">
    <xsl:text disable-output-escaping="yes">
      <![CDATA[<!DOCTYPE node [ <!ATTLIST node id ID #REQUIRED> ]>]]>
    </xsl:text>

    <xsl:choose>
      <xsl:when test="count(. | ../namespace::*) !=
                      count(../namespace::*)">
        <xsl:apply-templates select="." mode="idcopy"/>
      </xsl:when>

      <xsl:otherwise>
        <node id="{generate-id()}" type="namespace"
              name="{name()}" value="{.}"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="@*" mode="idcopy">
    <node id="{generate-id()}" type="attribute"
          name="{name()}" value="{.}"/>
  </xsl:template>

  <xsl:template match="node()" mode="idcopy">

    <node id="{generate-id()}" type="element" name="{name()}">

      <xsl:apply-templates select="@*" mode="idcopy"/>

      <xsl:for-each select="namespace::*">
        <xsl:if test="not(.=../../namespace::*) and name()!='xml'">
          <node id="{generate-id()}" type="namespace"
                name="{name()}" value="{.}"/>
        </xsl:if>
      </xsl:for-each>

      <xsl:apply-templates mode="idcopy"
        select="*|text()|comment()|processing-instruction()"/>
    </node>
  </xsl:template>

  <xsl:template match="comment()" mode="idcopy">
    <node id="{generate-id()}" type="comment" value="{.}"/>
  </xsl:template>

  <xsl:template match="processing-instruction()" mode="idcopy">
    <node id="{generate-id()}" type="processing-instruction"
          value="{.}"/>
  </xsl:template>

  <xsl:template match="text()" mode="idcopy">
    <node id="{generate-id()}" type="text" value="{.}"/>
  </xsl:template>

</xsl:stylesheet>
$


[1] http://stamm-wilbrandt.de/en/xsl-list/idc.xsl


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike@xxxxxxxxxxxx>
To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:       08/24/2010 02:17 PM
Subject:    Re: Fw: [xsl] Question on duplicate node elimination



  I haven't understood your logic in any detail, but I wonder if it
suggests an alternative approach to the problem: namely, avoid creating
RTFs entirely, at least for intermediate results. Instead, whenever you
are evaluating an operation that returns a node-set, represent that
node-set as a string containing the generate-id values of the nodes in
the node-set, space-separated. Elimination of duplicates then reduces to
an operation on strings: not trivial, but not especially difficult either.

Michael Kay
Saxonica

Current Thread