RE: [xsl] xsl:sort with msxml english language, danish characters, weird results

Subject: RE: [xsl] xsl:sort with msxml english language, danish characters, weird results
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 25 Oct 2004 13:26:07 +0100
> What are
> the rules for accenting? I suppose that most people if you
> asked them what x was
> would say that's an o with a slash through it, and an f was
> an a and an e stuck
> really close together, hence mnemonic entities, but is that
> the rule for
> determining what is an accented character? We asked 100
> people and 90 gave the
> following answer?

I used the term "accent" very loosely. For the full gory detail, see the
Unicode Collation Algorithm [1]. I don't know if Microsoft follow this
precisely, but they are probably using the same principles.

As for how they collected the data - yes, they probably asked a few
non-randomly selected people, and they looked in some (possibly out of date)
textbooks, and when they got it badly wrong people complained and they
sometimes fixed it. There isn't a single right answer - different publishers
sort their dictionaries and indexes and phone books in different ways, and
none of them is wrong. The UCA is written as if there is a single correct
answer, but there isn't.

Michael Kay
http://www.saxonica.com/

[1] http://www.unicode.org/reports/tr10/

Current Thread