Subject: Re: [xsl] Question on duplicate node elimination From: Michael Kay <mike@xxxxxxxxxxxx> Date: Mon, 23 Aug 2010 10:57:47 +0100 |
Michael Kay Saxonica
What makes you think it would be difficult?
It looks at least not obvious to me and I thought I better ask here before reinventing the wheel.
Of course, a processor needs some way to decide whether two nodes are
identical/distinct. Given such a mechanism, it's not difficult to come
up with algorithms that eliminate duplicate nodes.
The question is how this mechanism might look like in XPath 1.0 plus exslt:node-set function, see below.
In practice, when XPath 1.0 is used as part of XSLT 1.0, the XPath
requirement to eliminate duplicates can always be combined with the XSLT
requirement to deliver the node-set sorted in document order.
OK, in thread [1] it was asked how to dynamically evaluate XPath expressions. Dimitre's posting [2] of a nametest only single node result XSLT 1.0 solution and the enhancement of that to handle result sets in [3] made me start my own XPath evaluator 10 days ago [4]. While it is complete on the Location step side it has near zero support for predicates and cannot handle duplicate node elimination (last sample).
In posting [5] I learned that XSLTForm project has its own built in (not complete) XPath evaluator. Editing file xpath.xml allows to test some XPath expressions and it turned out that this XPath evaluator is not able to do duplicate node elimination as well.
Searching Google for "XPath Parser" did not help. But searching the list archives of xsl-list I found the very interesting posting [6] of Jeni Tenison on his XPath parser xpath.xsl generating a parse tree and xpath-doc.xsl generating an explanation of a XPath statement.
His xpath.xsl seems to be pretty complete (as an indicator the call graph for the named templates [7], generated by stylesheet [10] with dot/graphwiz [9] seems to match the spec quite good).
So my idea was to just start with xpath-doc.xsl and instead of generating an explanation make it do the actual XPath evaluation instead. And this is the point where duplicate node elimination is needed.
So the
natural way to eliminate duplicates is as part of the sorting process.
Which sorting process -- a global one or the one for the incremental steps in working on the XPath parse tree?
ANY help is really appreciated!
Take this XPath as a more complex example (the duplicate nodes may exist on any level in the input ducument): //c/ancestor::*
Jeni's xpath-doc.xsl generates this explanation: the root node's descendant<span class="name">c</span> elements's ancestor elements
And his xpath.xsl generates this parse tree: <path> <root/> <step axis="descendant-or-self"> <node/> </step> <step axis="child"> <element name="c" namespace="default"/> </step> <step axis="ancestor"> <element namespace="any"/> </step> </path>
Having a solution would enable the major browsers (with the technique of David Carlisle [8]) to dynamically evaluate Xpath expressions without having a dyn:evaluate() implementation.
[1]
http://stackoverflow.com/questions/3015942/retrieving-xml-node-from-a-path-specified-in-an-attribute-value-of-another-node
[2]
http://stackoverflow.com/questions/3015942/retrieving-xml-node-from-a-path-specified-in-an-attribute-value-of-another-node/3017752#3017752
[3]
http://stackoverflow.com/questions/3015942/retrieving-xml-node-from-a-path-specified-in-an-attribute-value-of-another-node/3485079#3485079
[4] http://stamm-wilbrandt.de/en/xsl-list/xpath1/v0.3/
[5]
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201008/msg00203.html
[6]
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/200212/msg00315.html
[7] http://stamm-wilbrandt.de/en/xsl-list/xpath.dot.pdf
[8] http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html
[9] http://www.graphviz.org/
[10]
$ cat dotnc.xsl
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
<xsl:output method="text"/>
<xsl:template match="/"> digraph G { size="7.5,10"; rankdir=TB; concentrate=true;
<xsl:for-each select="//xsl:template[@name]"> "<xsl:value-of select="@name"/>"[shape=box]; <xsl:if test=".//xsl:apply-templates"> "<xsl:value-of select="@name"/>" -> "xsl:apply-templates"; </xsl:if> </xsl:for-each>
<xsl:for-each select="//xsl:call-template"> "<xsl:value-of select="string(ancestor::xsl:template/@name)"/>" -> "<xsl:value-of select="@name"/>"; </xsl:for-each> } </xsl:template>
</xsl:stylesheet> $ $ xsltproc dotnc.xsl xpath.xsl>xpath.dot $ dot -Tps xpath.dot>xpath.dot.ps $ ps2pdf xpath.dot.ps $
Mit besten Gruessen / Best wishes,
Hermann Stamm-Wilbrandt Developer, XML Compiler, L3 WebSphere DataPower SOA Appliances ---------------------------------------------------------------------- IBM Deutschland Research& Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
From: Michael Kay<mike@xxxxxxxxxxxx> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Date: 08/23/2010 12:36 AM Subject: Re: [xsl] Question on duplicate node elimination
What makes you think it would be difficult?But how could the algorithm step of "duplicate elimination" be done? How can the duplicates be determined and removed, correctly?
Of course, a processor needs some way to decide whether two nodes are identical/distinct. Given such a mechanism, it's not difficult to come up with algorithms that eliminate duplicate nodes.
In practice, when XPath 1.0 is used as part of XSLT 1.0, the XPath requirement to eliminate duplicates can always be combined with the XSLT requirement to deliver the node-set sorted in document order. So the natural way to eliminate duplicates is as part of the sorting process.
For performance, the most important technique is static analysis to identify those path expressions where the sort (and duplicate elimination) are unnecessary. For example, this is the case for the expression /a/b/c if it is evaluated either (a) using nested loops, or (b) by scanning the entire source document looking for nodes that match this pattern. For the expression //x//y, a sort is necessary if the evaluation uses nested loops, but not if it uses a whole-document scan and pattern matching. Remember that the evaluation techniques used internally may be very different from the descriptions you find in explanations of the semantics of the language.
The way you have phrased the question suggests that you might be worrying about how exslt:node-set() affects the process. Simple answer - it doesn't.
Michael Kay Saxonica
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Question on duplicate nod, Hermann Stamm-Wilbra | Thread | Re: [xsl] Question on duplicate nod, Hermann Stamm-Wilbra |
Re: [xsl] Question on duplicate nod, Hermann Stamm-Wilbra | Date | Re: [xsl] Question on duplicate nod, alain . couthures |
Month |