Re: [xsl] XSLT 1.0: Problem grouping disparate unordered data

Subject: Re: [xsl] XSLT 1.0: Problem grouping disparate unordered data
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 15 Mar 2006 13:14:17 -0500
Nick,

I would approach it like this (I hope I've understood your requirements exactly):

<xsl:for-each select="//ape[@priority &gt; 2]">
  <xsl:sort select="[fancy sort key ordering by type of ape]"/>
  <xsl:sort select="priority"/>
  <xsl:sort select="date"/>
  <section>
    <xsl:if test="position() &lt; 4">
      ... output ...
    </xsl:if>
  </section>
</xsl:if>

This would reduce the problem to figuring out the sort key that would let you sort by Gorillas, Chimps, Orangs and Bonobos in that order.

Where's David Carlisle when you need him? He could probably concoct something using translate() to do this in 1.0.

How about select="translate(@type,'GCOB','abcd')"/> as something simple, which ought nonetheless to work?

But you said your actual data isn't about apes, so knowing the solution to that particular problem won't actually help you. The tougher part of your problem is there: "The types are arbitrary names (not sortable)". A trick with translate() may be your best hope to make them so; without it, it's going to be tough to do in a single pass.

Cheers,
Wendell

At 01:00 PM 3/15/2006, you wrote:
Hi all,

Firstly, I'm using XSLT 1.0.

As my real dataset is very large and boring, I'm presenting this problem
in terms of an input document which represents the same problems in a less
noisy form.

This is the simplified example input:

<apes>
        <ape priority="5" date="25-01-2006" type="Chimpanzee" />
        <ape priority="1" date="26-01-2006" type="Gorilla"    />
        <ape priority="2" date="29-01-2006" type="Chimpanzee" />
        <ape priority="1" date="22-01-2006" type="Orangutan"  />
        <ape priority="3" date="22-01-2006" type="Bonobo"     />
        <ape priority="1" date="25-01-2006" type="Bonobo"     />
        <ape priority="4" date="24-01-2006" type="Gorilla"    />
        <ape priority="5" date="22-01-2006" type="Bonobo"     />
        <ape priority="4" date="26-01-2006" type="Chimpanzee" />
        <ape priority="4" date="25-01-2006" type="Gorilla"    />
        <ape priority="2" date="25-01-2006" type="Bonobo"     />
        <ape priority="3" date="25-01-2006" type="Orangutan"  />
        <ape priority="1" date="25-01-2006" type="Bonobo"     />
        <ape priority="3" date="27-01-2006" type="Gorilla"    />
        <ape priority="1" date="25-01-2006" type="Chimpanzee" />
        <ape priority="1" date="25-01-2006" type="Orangutan"  />
</apes>

The rules for grouping and sorting are fairly simple in the basic case:

Show Gorillas of priority greater than 2, sorted by priority, then by date
descending;
Show Chimpanzees of priority greater than 2, sorted by priority, then by
date descending;
Show Orangutans of priority greater than 2, sorted by priority, then by
date descending;
Show Bonobos of priority greater than 2, sorted by priority, then by date
descending;

and I'm done.

The current approach is basically to get all the apes of one kind:

<xsl:variable name="gorillas" select="apes/ape[@type='Gorilla' and
number(@priority) &gt; 2]"/>

and they are then sorted on priority and date; that code is
straightforward, apart from the substring-before shenanigans required to
get those annoying UK dates to sort correctly.

The above process is repeated for each species. Leaving out irrelevancies,
the results for the above document would be along the lines of:

<section>
    <Gorilla    priority="4" date="25-01-2006" />
    <Gorilla    priority="4" date="24-01-2006" />
    <Gorilla    priority="3" date="27-01-2006" />
</section>
<section>
    <Chimpanzee priority="5" date="25-01-2006" />
    <Chimpanzee priority="4" date="26-01-2006" />
</section>
<section>
    <Orangutan  priority="3" date="25-01-2006" />
</section>
<section>
    <Bonobo     priority="5" date="22-01-2006" />
    <Bonobo     priority="3" date="22-01-2006" />
</section>

However, I now have the further requirement that I return a single
<section> containing only the first three items (or fewer if less than
three match the "priority greater than 2" criterion). In this example that
would give just the three gorillas. If all but one of the gorillas
escaped, I would have to output that remaining gorilla followed by the two
chimpanzees; and if all the gorillas got away, with one of the chimps as
driver, I have to return the remaining chimp, the orangutan, and the
highest-priority bonobo.

Given that:

The types are arbitrary names (not sortable);
The initial dataset is not sorted on any field, and cannot be as it's
coming from an external provider;
I'm not permitted to use any extension functions like nodeset(), as my
client may want to move to a different XSLT processor at a later date;

how can I achieve the necessary grouping and sorting?

I've been racking my brains over this one and I'm almost certain some
straightforward Muenchian grouping will suffice, but as we're 2 days away
from taking the system live and I'm dealing with a new bug report every
half hour or so spanning XSLT, HTML, CSS and JSP, I'm finding it hard to
get the time to really wrap my head round this one. Any help/advice would
be greatly appreciated.

For the curious: the real data has nothing to do with apes; I just thought
they'd brighten the place up. No simians were harmed in the creation of
this cry for help :-)

TIA,

Nick.
--
Nick Fitzsimons
http://www.nickfitz.co.uk/


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread