Re: [xsl] Question on duplicate node elimination

Subject: Re: [xsl] Question on duplicate node elimination
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Sun, 22 Aug 2010 23:36:00 +0100
But how could the algorithm step of "duplicate elimination" be done?
How can the duplicates be determined and removed, correctly?


What makes you think it would be difficult?


Of course, a processor needs some way to decide whether two nodes are identical/distinct. Given such a mechanism, it's not difficult to come up with algorithms that eliminate duplicate nodes.

In practice, when XPath 1.0 is used as part of XSLT 1.0, the XPath requirement to eliminate duplicates can always be combined with the XSLT requirement to deliver the node-set sorted in document order. So the natural way to eliminate duplicates is as part of the sorting process.

For performance, the most important technique is static analysis to identify those path expressions where the sort (and duplicate elimination) are unnecessary. For example, this is the case for the expression /a/b/c if it is evaluated either (a) using nested loops, or (b) by scanning the entire source document looking for nodes that match this pattern. For the expression //x//y, a sort is necessary if the evaluation uses nested loops, but not if it uses a whole-document scan and pattern matching. Remember that the evaluation techniques used internally may be very different from the descriptions you find in explanations of the semantics of the language.

The way you have phrased the question suggests that you might be worrying about how exslt:node-set() affects the process. Simple answer - it doesn't.

Michael Kay
Saxonica

Current Thread