Subject: Re: Fw: [xsl] Question on duplicate node elimination From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx> Date: Mon, 30 Aug 2010 20:53:41 +0200 |
This is a question on "pointers" in XSLT. The sample ancestor3.xml [1] is demonstration for nodes "//*" and "ids($nodes/ancestor::*)". This excludes the root node. ancestor4.xml [2] demonstrates "ids($nodes/ancestor::node())" and nodes "/|//*" (includes root node). This is the modified key definition needed by dupelim4.xsl [3]: <xsl:key name="nodes-by-genid" match="/" use="generate-id()"/> <xsl:key name="nodes-by-genid" match="node()" use="generate-id()"/> By this definition every of the seven node types in the XML data model [4] is covered. Having an id-node-set of the form <id>some_id_1</id> <id>some_id_2</id> ... <id>some_id_k</id> as in [3] allows to (efficiently) "address" the represented nodes in the XML tree by the key() function. And every node-set can be represented by such an id-node-set. Result tree fragments of id-node-sets can be converted to id-node-sets by the exslt:node-set() function as in [3]. This allows for iteratively generating new id-node-sets. I did a quick search for "XSLT pointer" and found hits for pointers in C-implementations of XSLT processors or for "XPointer". Can representing the current node by <id><xsl:value-of select="generate-id()"/></id> in conjuntion with "bulk" conversion to corresponding (real) node-set by "key('nodes-by-genid',exslt:node-set($nodes)/id)" for id-node-set $nodes be considered as XSLT "pointer" representation of the current node as in C? [1] http://stamm-wilbrandt.de/en/xsl-list/ancestor3.xml [2] http://stamm-wilbrandt.de/en/xsl-list/ancestor4.xml [3] http://stamm-wilbrandt.de/en/xsl-list/dupelim4.xsl [4] http://www.w3.org/TR/xpath/#data-model Mit besten Gruessen / Best wishes, Hermann Stamm-Wilbrandt Developer, XML Compiler, L3 WebSphere DataPower SOA Appliances ---------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Hermann Stamm-Wilbrandt/Germany/IBM@IBMDE To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Date: 08/27/2010 03:00 PM Subject: Re: Fw: [xsl] Question on duplicate node elimination Michael, > ... Instead, whenever you > are evaluating an operation that returns a node-set, represent that > node-set as a string containing the generate-id values of the nodes in > the node-set, space-separated. Elimination of duplicates then reduces to > an operation on strings: not trivial, but not especially difficult > either. yesterdays solution [1] based on id() function was working good. But I thought again and below single file solution based on applying key() function twice for duplicate elimination is much better: * does not need any separately created structure (like idcopy in [1]) * is really short, just a few lines (not counting comments) * works on ALL major browsers (IE support by David Carlisle's trick [4]) Below are * execution by xsltproc * listing of dupelinm3.xsl [2] * listing of ancestor.xml [3] (open that in browser) $ xsltproc dupelim3.xsl ancestor3.xml <html><pre><h2>Duplicate node elimination by applying key() function twice</h2> See <a href="dupelim3.xsl">dupelim3.xsl</a> for details. Tested to work with these browsers: Chrome Firefox Internet Explorer Opera Safari (clicking reload shows different ids) ids(//*) a id2619817 +-b id2619788 ! +-c id2619830 ! +-c id2619802 +-b id2619245 ! +-c id2619317 ! +-c id2619321 <hr> ids(//c): <id>id2619830</id><id>id2619802</id><id>id2619317</id><id>id2619321</id> <hr> nodes="ids(//c)"<br>ids($nodes/ancestor::*): <id>id2619817</id><id>id2619788</id><id>id2619245</id> </pre></html> $ $ cat dupelim3.xsl <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="exslt msxsl" > <xsl:output method="html"/> <xsl:key name="nodes-by-genid" match="node()" use="generate-id()"/> <xsl:template match="/"> <!-- initial node-set sample, represented by <id> nodes --> <xsl:variable name="nodes"> <xsl:for-each select="//c"> <id><xsl:value-of select="generate-id()"/></id> </xsl:for-each> </xsl:variable> <!-- do ancestor location step --> <xsl:variable name="result"> <!-- application of "ancestor::*" on $nodes; $aux might contain duplicate id nodes --> <xsl:variable name="aux"> <!-- use key() function to determine real nodes --> <xsl:for-each select="key('nodes-by-genid',exslt:node-set ($nodes)/id)"> <!-- location step on each real node --> <xsl:for-each select="ancestor::*"> <!-- generate <id>s for new nodes --> <id><xsl:value-of select="generate-id()"/></id> </xsl:for-each> </xsl:for-each> </xsl:variable> <!-- use key() function for duplicate elimination --> <xsl:for-each select="key('nodes-by-genid',exslt:node-set($aux)/id)"> <!-- generate <id>s, now for unique new nodes --> <id><xsl:value-of select="generate-id()"/></id> </xsl:for-each> </xsl:variable> <html><pre> <h2>Duplicate node elimination by applying key() function twice</h2> See <a href="dupelim3.xsl">dupelim3.xsl</a> for details. Tested to work with these browsers: Chrome Firefox Internet Explorer Opera Safari (clicking reload shows different ids) <!-- node name vs genid output --> <xsl:text> ids(//*)</xsl:text> <xsl:for-each select="//*"> <xsl:value-of select= "concat(' ',substring('! +-',5-2*count(ancestor::*)),name(), substring(' ',1+2*count(ancestor::*)),' ',generate-id())"/> </xsl:for-each> <xsl:text> </xsl:text><hr/><xsl:text> </xsl:text> <!-- for verification --> <xsl:text>ids(//c): </xsl:text> <xsl:copy-of select="$nodes"/> <xsl:text> </xsl:text><hr/><xsl:text> </xsl:text> <!-- output of result --> <xsl:text>nodes="ids(//c)"</xsl:text><br/> <xsl:text>ids($nodes/ancestor::*): </xsl:text> <xsl:copy-of select="$result"/> <xsl:text> </xsl:text> </pre></html> </xsl:template> <!-- from http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html --> <msxsl:script language="JScript" implements-prefix="exslt"> this['node-set'] = function (x) { return x; } </msxsl:script> </xsl:stylesheet> $ $ cat ancestor3.xml <?xml-stylesheet href="dupelim3.xsl" type="text/xsl"?> <a> <b> <c>1</c> <c>2</c> </b> <b> <c>3</c> <c>4</c> </b> </a> $ [1] http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201008/msg00291.html [2] http://stamm-wilbrandt.de/en/xsl-list/ancestor3.xml [3] http://stamm-wilbrandt.de/en/xsl-list/dupelim3.xml [4] http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html Mit besten Gruessen / Best wishes, Hermann Stamm-Wilbrandt Developer, XML Compiler, L3 WebSphere DataPower SOA Appliances ---------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Michael Kay <mike@xxxxxxxxxxxx> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Date: 08/24/2010 02:17 PM Subject: Re: Fw: [xsl] Question on duplicate node elimination I haven't understood your logic in any detail, but I wonder if it suggests an alternative approach to the problem: namely, avoid creating RTFs entirely, at least for intermediate results. Instead, whenever you are evaluating an operation that returns a node-set, represent that node-set as a string containing the generate-id values of the nodes in the node-set, space-separated. Elimination of duplicates then reduces to an operation on strings: not trivial, but not especially difficult either. Michael Kay Saxonica
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Fw: [xsl] Question on duplicate, Hermann Stamm-Wilbra | Thread | [no subject], Unknown |
Re: [xsl] questions on "number('+5', Hermann Stamm-Wilbra | Date | [xsl] How to output the unused name, Costello, Roger L. |
Month |