Re: [xsl] xsl:sort with msxml english language, danish characters, weird results

Subject: Re: [xsl] xsl:sort with msxml english language, danish characters, weird results
From: "W. Eliot Kimber" <ekimber@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 25 Oct 2004 10:50:42 -0500
Michael Kay wrote:

The UCA is written as if there is a single correct
answer, but there isn't.

The UCA doesn't define a particular collation sequence for any languages, rather it defines the requirements for how collation mechanisms should allow you to define the collation rules for a given language and script. The Unicode standard is very clear that collation is highly variable and that there is no single answer for any language or script. [Even for a single language you might have different collation rules for glossaries and indexes, for example.]


Java's built-in RuleBasedCollator class implements a collation mechanism that, as far as I know, conforms to the Unicode UCA in that it provides the functionality needed (althought it may not fully address issues of how to handle composed and uncomposed characters--I'm not sure about the details there). The IBM ICU package provides a more complete implementation of the UCA and the ICU4J package provides an alternative set of built-in language-specific collators that are more complete and accurate than those shipped with Java.

Cheers,

E.
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122

eliot@xxxxxxxxxxxxxxxxxxx
www.innodata-isogen.com

Current Thread