Re: [xsl] Stuck with select distinct

Subject: Re: [xsl] Stuck with select distinct
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Mon, 10 Nov 2008 15:52:13 -0500
Mark,

One can frequently use raw XPath to achieve the same end as using a key. It's just that for certain applications in certain processors, this can give very poor performance.

In order to get all 'colour' values, as you know, you can traverse "//colour".

In order to exclude Red and Blue, use a predicate:

//colour[not(.='Red' or 'Blue')]

You have also stipulated that "rgb value is not FFFFFF". But 'colour' sometimes has several 'rgb' element siblings. Your source code sample suggests it's the 'rgb' value directly following each 'colour' that is a concern, so:

//colour[not(.='Red' or 'Blue')][not(following-sibling::rgb[1]='FFFFFF']

which returns all colour elements whose value is not 'Red' or 'Blue' and whose directly succeeding 'rgb' element's value is not 'FFFFFF'.

To pull the distinct values from this set without a key, the easiest thing is to perform a test within your for-each (or your matched template), as in

<xsl:for-each select="$colours">
  <xsl:variable name="pos" select="position()"/>
  <xsl:if test="not(. = $colours[position &lt; $pos])">
    ...

... which is very direct and useful for many purposes.

Not for all, however. (For example, if positions are going to be important within the for-each.) Then you have to filter $colours itself to keep only distinct values, which means brute-force testing using the predicates again:

$colours[not(. = preceding::colour[not(.='Red' or 'Blue')]
                  [not(following-sibling::rgb[1]='FFFFFF'])]

... which is so outlandish (and potentially so poor in performance) that one wonders why you'd "prefer to avoid using keys if possible", since a key is so straightforward:

<xsl:key name="c" use="."
  match="colour[not(.='Red' or 'Blue')]
               [not(following-sibling::rgb[1]='FFFFFF']"/>

and then select="$colours[generate-id()=generate-id(key('c',.)[1])]"/>

Cheers,
Wendell

At 11:36 AM 11/8/2008, you wrote:
Hi All

I'm trying to get a list of distinct items from an XML. I've done this many times using a predicate containing a preceding axis, but this one has got me stumped:

<page>
        <front_back>F</front_back>
        <page_no>1</page_no>
        <colours>
                <colour>Red</colour>
                <rgb>00FFFF</rgb>
                <colour>Green</colour>
                <rgb>00FF00</rgb>
                <colour>Blue</colour>
                <rgb>FFFF00</rgb>
        </colours>
</page>
<page>
        <front_back>F</front_back>
        <page_no>2</page_no>
        <colours>
                <colour>Green</colour>
                <rgb>FFFFFF</rgb>
        </colours>
</page>
<page>
        <front_back>F</front_back>
        <page_no>3</page_no>
        <colours>
                <colour>Green</colour>
                <rgb>00FF00</rgb>
        </colours>
</page>

I need to return a nodeset with a list of DISTINCT colour nodes, that I can then process in a for-each element.

The other conditions for selection are:

    colour is not Red or Blue
    rgb value is not FFFFFF


I somehow need to combine the following predicates (I think)


    colours/colour[. != 'Red' and . != 'Blue']
    colours/rgb[. != 'FFFFFF']
    colours/colour[not(. = preceding-sibling::colour)]

I'm stuck with XSL 1.0 and would like to avoid using keys if possible

Any suggestions greatly appreciated

Regards

Mark


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread