RE: [xsl] Sorting Upper-Case first. Microsoft bug?

Subject: RE: [xsl] Sorting Upper-Case first. Microsoft bug?
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Tue, 5 Aug 2003 16:53:33 +0100
I don't know exactly what the intent of the XSLT 1.0 spec for case-order
was, but you need to read the definition in the light of the two
(non-normative) notes that follow it. 

The first says that two implementations may produce different results -
in other words, the spec does not attempt to be completely prescriptive
about the output order (therefore, by definition, this is not a
Microsoft non-conformance). 

The second note points to Unicode TR-10:
http://www.unicode.org/unicode/reports/tr10/index.html

Section 6.6 of this report recommends that implementations should allow
the user to decide whether lower-case should sort before or after
upper-case, and my guess is that the xsl:sort parameter was intended to
implement this recommendation.

In turn this should be read in the context of the collation algorithm
given in the report, which sorts strings in three phases:

- alphabetic ordering
- diacritic ordering
- case ordering

The key thing here is that case is only considered if the two strings
(as a whole) are the same except in case. So Xaaaa will sort before
xaaaa if upper-case comes first; but Xaaaa will always sort before
xaaab, regardless of case order.

It looks to me from this evidence as if Microsoft is implementing
something close to the Unicode TR10 algorithm.

Michael Kay

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx 
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Yago Alvarado
> Sent: 05 August 2003 11:52
> To: 'xsl-list@xxxxxxxxxxxxxxxxxxxxxx'
> Subject: [xsl] Sorting Upper-Case first. Microsoft bug?
> 
> 
> Hi!
> 
>    I'm not quite sure whether this is a bug in the Microsoft 
> Parser (Microsoft XML 4.0) or it's me doing something wrong...
> 
> According to the W3C Recommendation:
> 
> ----------------------->8---------------------->8-------------
> ----------
> ----------------------->8---------------------->---
> case-order has the value upper-first or lower-first; this 
> applies when data-type="text", and specifies that upper-case 
> letters should sort before lower-case letters or vice-versa 
> respectively. For example, if lang="en", then A a B b are 
> sorted with case-order="upper-first" and a A b B are sorted 
> with case-order="lower-first". The default value is language 
> dependent.
> ----------------------->8---------------------->8-------------
> ----------
> ----------------------->8---------------------->---
> 
> 
> I'm trying to sort some xml and I want to show first the 
> elements in upper case 
> and then the ones in lower case. See xml/xsl below:
> 
> 
> XML
> ---
> 
> <?xml version="1.0" encoding="iso-8859-1" ?>
> <recordset name="">
>     <row ReturnValue="0" Store_ID="7" Location="WA4135"  />
>     <row ReturnValue="0" Store_ID="5" Location="wA4131"  />
>     <row ReturnValue="0" Store_ID="6" Location="WA4133"  />
>     <row ReturnValue="0" Store_ID="8" Location="wA4136"  />
>     <row ReturnValue="0" Store_ID="9" Location="WA4136"  />
>     <row ReturnValue="0" Store_ID="10" Location="WA4138" />
>     <row ReturnValue="0" Store_ID="11" Location="WA4139" /> 
> </recordset>
> 
> 
> Please note I've changed some of the Location items to 'w' 
> rather than 'W' (Store_IDs 5 and 8)
> 
> 
> 
> XSL:
> ----
> 
> <?xml version="1.0"?>
> <xsl:stylesheet version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
> <xsl:output method="xml" indent="yes"/>
> 
> 
> <xsl:template match="/">
> 	<recordset>
> 		<xsl:apply-templates select="recordset/row">
> 			<xsl:sort select="@Location" 
> data-type="text" order="ascending" case-order="upper-first"/>
> 		</xsl:apply-templates>
> 	</recordset>
> </xsl:template>
> 
> <xsl:template match="row">
> 	<row>
> 		<xsl:for-each select="@*">
> 			<xsl:attribute name="{name()}">
> 				<xsl:value-of select="."/>
> 			</xsl:attribute>
> 		</xsl:for-each>
> 	</row>
> </xsl:template>
> 
> 
> </xsl:stylesheet>
> 
> 
> Now... I would expect to see the following result:
> 
> <?xml version="1.0" encoding="UTF-16"?>
> <recordset>
>    <row ReturnValue="0" Store_ID="6" Location="WA4133" />
>    <row ReturnValue="0" Store_ID="7" Location="WA4135" />
>    <row ReturnValue="0" Store_ID="9" Location="WA4136" />
>    <row ReturnValue="0" Store_ID="10" Location="WA4138" />
>    <row ReturnValue="0" Store_ID="11" Location="WA4139" />
>    <row ReturnValue="0" Store_ID="5" Location="wA4131" />
>    <row ReturnValue="0" Store_ID="8" Location="wA4136" /> </recordset>
> 
> 
> Lower case 'w' items at the end.
> 
> 
> 
> However, what I am getting is:
> 
> 
> 
> <?xml version="1.0" encoding="UTF-16"?>
> <recordset>
>    <row ReturnValue="0" Store_ID="5" Location="wA4131" />
>    <row ReturnValue="0" Store_ID="6" Location="WA4133" />
>    <row ReturnValue="0" Store_ID="7" Location="WA4135" />
>    <row ReturnValue="0" Store_ID="9" Location="WA4136" />
>    <row ReturnValue="0" Store_ID="8" Location="wA4136" />
>    <row ReturnValue="0" Store_ID="10" Location="WA4138" />
>    <row ReturnValue="0" Store_ID="11" Location="WA4139" /> 
> </recordset>
> 
> 
> So it seems to be doing the sorting independently of the case 
> and then when it finds items with the same caracters, it 
> sorts them according to the case i.e. first upper-case then 
> lower-case.
> 
> 
> 
> Is this the expected behaviour?
> Is it me missing something here?
> 
> 
> 
> Thanks,
> Yago
> 
> _____________________________________________________________________
> This e-mail has been scanned for viruses by MessageLabs.
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread