Re: [xsl] Getting a distinct list of node names

Subject: Re: [xsl] Getting a distinct list of node names
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 15 Dec 2003 20:14:52 -0500
At 2003-12-15 19:07 -0500, Wendell Piez wrote:
In fact, it's the first step in the oft-recommended Muenchian grouping technique. (De-duplicate nodes by grouping criterion, then group by the set of de-duplicated "flag-bearer" nodes.)

A common way to do this is simply to compare a node's value to other nodes (here it's the value of the node's name that you care about):

node:definition/*[not(name() = name(preceding-sibling::*))]

....which will have poor performance on a large set of nodes (but you get the idea).

You could also do

<xsl:key name="nodes-by-name" match="*" use="name()"/>

and then call for

node:definition/*[count(.|key('nodes-by-name',name())[1] = 1]

which should work better on large sets of input.

Ah, it looks like Ken has another solution....

When I considered the three approaches to grouping: axis-based, key-based and variable-based, I discounted axis-based for the reasons you cite.


But I discounted key-based approaches because keys have document-wide scope and it wasn't clear if the original poster's element types would be globally the same or unique in each context. Also, I probably would have populated a key table with an element's parent element's name and then for each element name pulling the keyed values out of the table would pull out the children's nodes for each element. Come to think of it a bit more, I probably would use the element's parent's generated identifier to accommodate context, but, again, with keys being document-wide in scope, I didn't think at the time that was an option (though I wonder now if a table of parent ids might be ideal).

Looking again at your proposal, I wonder if an element's child first occurred (in document scope) as a child of a preceding element, then would it be detected? It needs more thought on my part.

So in the short time I took I was left with the variable-based approach, since one can easily control the scope of nodes in which you wish to find unique values. The only drawback I find with the variable-based approach is the result isn't a set of nodes and is only a construction of the result tree. This sometimes (always?) makes generating separators difficult.

But ... the requirements are underspecified, so perhaps your approach would in the long run be the best.

................... Ken


-- North America (Washington, DC): 3-day XSLT/2-day XSL-FO 2004-02-09 Instructor-led on-site corporate, government & user group training for XSLT and XSL-FO world-wide: please contact us for the details

G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
ISBN 0-13-065196-6                       Definitive XSLT and XPath
ISBN 0-13-140374-5                               Definitive XSL-FO
ISBN 1-894049-08-X   Practical Transformation Using XSLT and XPath
ISBN 1-894049-11-X               Practical Formatting Using XSL-FO
Member of the XML Guild of Practitioners:     http://XMLGuild.info
Male Breast Cancer Awareness  http://www.CraneSoftwrights.com/s/bc


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread