Re: [xsl] Sorting complex objects

Subject: Re: [xsl] Sorting complex objects
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Sun, 08 Mar 2009 09:20:45 -0400
At 2009-03-08 01:47 -0700, Mark Wilson wrote:
I need to transform a file of random-order complex items into an ordered file. I tried using 'for-each-group' nested, but failed to get the results I wanted

As I said, in the input file, the items are random and an item may occur more than once and have <SomeData> elements with different contents - both <SomeData> elements need to be preserved in the output file. The <Items> are ordered by <Heading>, then sequentially by the three <SubDivX> elements, and finally by <SomeData>. Everything sorts as strings.

I have been thinking about something like:

You were very close, all you were missing was the reconstitution of the <Item> element.


Also, where you simply concatenated your string components into your key, I introduced a delimiter between them. Most times a simple concatenation won't do, because the concatenation puts nodes out of order.

Consider the two sequences in order of string values, left to right:

  ('ab','xy')
  ('abcd',ef')

Simple concatenation will put these out of order as:

  ('abcd','ef') <!--key "abcdef"-->
  ('ab','xy')   <!--key "abxy"-->

But, what to use for the delimiter? Unicode reserves U+FDD0 as "non-characters", and as such, expects these not to be in any file interchanged by users. But they are, effectively, available as application-internal private-use code points. And they sort *after* characters used by users in XML documents, which makes them flexible in the use of concatenation.

But what about at the low end of the scale? Any valid XML character, just based on the property of being a valid XML character, is a character that might very well have been input by the user. So you have to use a character that is unlikely to be input by the user. Because of end-of-line-sequence normalization, I think that the carriage return is a very unlikely character to find in the input by the user, and it sorts *before* characters used in XML documents.

Putting that together, if I wanted absent values to sort after present values, I would use the following with your data:

  group-by="concat(Heading,'&#xfdd0;',
                   SubDiv1,'&#xfdd0;',
                   SubDiv2,'&#xfdd0;',SubDiv3)">

... thereby absent values would be considered the COBOL "high values". If I wanted absent values to sort before present values, as I do, I would use the following with your data to mimic COBOL "low values", keeping the U+FDD0 just because of the very (very!) small chance of finding a CR in the input:

  group-by="concat(Heading,'&#xd;&#xfdd0;',
                   SubDiv1,'&#xd;&#xfdd0;',
                   SubDiv2,'&#xd;&#xfdd0;',SubDiv3)">

And I get the solution I think you need. But note that your proposed output seems a little out of order due to the trailing "s" on "Avenue".

I hope this helps.

. . . . . . . Ken

t:\ftemp>type mark.xml
<List>
<Item>
   <Heading>Prague</Heading>
   <SubDiv1>Avenues</SubDiv1>
   <SubDiv2>Crosswalks</SubDiv2>
   <SomeData>12</SomeData>
</Item>
<Item >
   <Heading>Prague</Heading>
   <SomeData>1</SomeData> <!-- Needs to me consolidated-->
</Item>
<Item>
   <Heading>Prague</Heading>
   <SubDiv1>Avenues</SubDiv1>
   <SomeData></SomeData>
</Item>
<Item>
<Heading>Prague</Heading>
<SubDiv1>Avenue</SubDiv1>
<SubDiv2>Crosswalks</SubDiv2>
<SubDiv3>Dangerous</SubDiv3>
<SomeData></SomeData>
</Item>
<Item>
   <Heading>Bonn</Heading>
   <SubDiv1>Avenue</SubDiv1>
   <SubDiv2>Crosswalks</SubDiv2>
   <SubDiv3>Dangerous</SubDiv3>
   <SomeData></SomeData>
</Item>
<Item>
   <Heading>Prague</Heading>
   <SubDiv1>Streets</SubDiv1>
   <SomeData></SomeData>
</Item>
<Item>
   <Heading>Washington</Heading>
   <SomeData></SomeData>
</Item>
<Item >
   <Heading>Prague</Heading>
   <SomeData>2</SomeData><!-- Needs to me consolidated-->
</Item>
</List>

t:\ftemp>call xslt2 mark.xml mark.xsl
<?xml version="1.0" encoding="UTF-8"?>
<Item>
   <Heading>Bonn</Heading>
   <SubDiv1>Avenue</SubDiv1>
   <SubDiv2>Crosswalks</SubDiv2>
   <SubDiv3>Dangerous</SubDiv3>
   <SomeData/>
</Item>
<Item>
   <Heading>Prague</Heading>
   <SomeData>1</SomeData>
   <SomeData>2</SomeData>
</Item>
<Item>
   <Heading>Prague</Heading>
   <SubDiv1>Avenue</SubDiv1>
   <SubDiv2>Crosswalks</SubDiv2>
   <SubDiv3>Dangerous</SubDiv3>
   <SomeData/>
</Item>
<Item>
   <Heading>Prague</Heading>
   <SubDiv1>Avenues</SubDiv1>
   <SomeData/>
</Item>
<Item>
   <Heading>Prague</Heading>
   <SubDiv1>Avenues</SubDiv1>
   <SubDiv2>Crosswalks</SubDiv2>
   <SomeData>12</SomeData>
</Item>
<Item>
   <Heading>Prague</Heading>
   <SubDiv1>Streets</SubDiv1>
   <SomeData/>
</Item>
<Item>
   <Heading>Washington</Heading>
   <SomeData/>
</Item>
t:\ftemp>type mark.xsl
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                version="2.0">

<xsl:output indent="yes"/>

<xsl:template match="List">
  <xsl:for-each-group select="Item"
                      group-by="concat(Heading,'&#xd;&#xfdd0;',
                                       SubDiv1,'&#xd;&#xfdd0;',
                                       SubDiv2,'&#xd;&#xfdd0;',SubDiv3)">
    <xsl:sort select="current-grouping-key()"/>
    <!--reconstitute the Item-->
    <Item>
      <xsl:copy-of select="Heading,SubDiv1,SubDiv2,SubDiv3"/>
      <!--consolidating the <SomeData> elements-->
      <xsl:for-each select="current-group()">
        <xsl:sort select="SomeData"/>
        <xsl:copy-of select="SomeData"/>
      </xsl:for-each>
    </Item>
  </xsl:for-each-group>
</xsl:template>

</xsl:stylesheet>

t:\ftemp>rem Done!



--
XQuery/XSLT training in Prague, CZ 2009-03 http://www.xmlprague.cz
XQuery/XSLT/XSL-FO training in Los Angeles/Anaheim - 2009-06-01/10
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview:  http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

Current Thread