RE: [xsl] xsl:sort with msxml english language, danish characters, weird results

Subject: RE: [xsl] xsl:sort with msxml english language, danish characters, weird results
From: Bryan Rasmussen <bry@xxxxxxxxxx>
Date: Mon, 25 Oct 2004 12:52:57 +0200
-- 
Bryan Rasmussen




> It's up to the implementor whether they support lang="da" or not.
actually it turns out they do, these are the results
<?xml version="1.0" encoding="utf-8"?>
<e>aardvulf,
aardvark,
elborg,
xdense,
fthling,
zebra,
zandinsky,
xerces,
tip,
odense,
fadxl,
eelburg,
</e>

at first I was confused and thought it was silly but then I remembered that aa
is an archaic way of representing e in danish. 
> 
> For lang="en", it's up to the implementor how special characters are
> collated. This processor appears to be using a fairly commonly used
> algorithm in which characters are given a primary weight (a, b, c), and a
> secondary weight based on variations of the primary character (accents);
> the
> secondary weight of a character is taken into account only if the primary
> weights of all characters are equal.
> 

> 
> >From your comment it seems you don't like this algorithm. It would be
> interesting to know why, and what algorithm you would prefer.
> 

well actually if they just figured that unknown characters were at the end of
the alphabet then the sort order would be closer to danish, also the same with
norwegian and I believe swedish (I wonder how often that is the case). What are
the rules for accenting? I suppose that most people if you asked them what x was
would say that's an o with a slash through it, and an f was an a and an e stuck
really close together, hence mnemonic entities, but is that the rule for
determining what is an accented character? We asked 100 people and 90 gave the
following answer? 

I did a big long digression on the subject wondering about how something is
defined as accented, but in the end it really just was me wondering what are the
rules and if indeed there are any other than, we think that looks sort of like
something we're familiar with. 


> > -----Original Message-----
> > From: Bryan Rasmussen [mailto:bry@xxxxxxxxxx] 
> > Sent: 25 October 2004 10:14
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: [xsl] xsl:sort with msxml english language, danish 
> > characters, weird results
> > 
> > 
> > -- 
> > Bryan Rasmussen
> > 
> > Hi, I was doing some tests of sorting by various 
> > languages/charsets etc. and I
> > came across the following irritation; given xml like the following:
> > <?xml version="1.0" encoding="UTF-8"?>
> > <words>
> > <word>aardvark</word>
> > <word>xdense</word>
> > <word>elborg</word>
> > <word>aardvulf</word>
> > <word>odense</word>
> > <word>eelburg</word>
> > <word>zebra</word>
> > <word>zandinsky</word>
> > <word>tip</word>
> > <word>fthling</word>
> > <word>fadxl</word>
> > <word>xerces</word>
> > </words>
> > and an xslt like the following:
> > 
> > <?xml version="1.0" encoding="utf-8"?>
> > <xsl:stylesheet 
> > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
> > <xsl:param name="sortby" select="'en'"/>
> > <xsl:output method="xml" encoding="utf-8"/>
> > <xsl:template match="/">
> > <e>
> > <xsl:for-each select="/words/word">
> > <xsl:sort data-type="text" select="." lang="{$sortby}" 
> > order="descending" 
> > case-order="upper-first"/>
> > <xsl:value-of select="." />,
> > </xsl:for-each>
> > </e>
> > </xsl:template>
> > </xsl:stylesheet>
> > 
> > the output in msxsl the command line tool for msxml is:
> > 
> > <?xml version="1.0" encoding="utf-8"?><e>zebra,
> > zandinsky,
> > xerces,
> > tip,
> > xdense,
> > odense,
> > fadxl,
> > eelburg,
> > elborg,
> > fthling,
> > aardvulf,
> > aardvark,
> > </e>
> > 
> > This by the way is not the sort order for danish characters, 
> > it does not allow
> > sorting if the language sortby parameter is set to da (or at 
> > least not in the
> > proper order), so this being the case I wonder what the 
> > reasoning is behind the
> > sort order I'm seeing when the sortby parameter is en.
> > 
> > Anybody know, have any ideas? 

Current Thread