Re: [xsl] xsl:sort with msxml english language, danish characters, weird results
Subject: Re: [xsl] xsl:sort with msxml english language, danish characters, weird results|
From: "W. Eliot Kimber" <ekimber@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 25 Oct 2004 14:28:21 -0500
Bryan Rasmussen wrote:
now I don't suppose that there is a
processor anywhere that supports sorting in pre print versions of languages but
if it there was i guess it wouldn't matter because while you can set en-uk
you can't set languages by historical time periods (actually I suppose that
english that early would be an-sa or something right?) :)
At least in the Java XSLT processing domain using at least Saxon you can
implement custom collators to support any collation rules you want,
including those of old English or whatever.
As far as I'm concerned, any XSLT processor that does not provide a
clear and direct way to integrate arbitrary collators is not very useful
(but then almost all of my use of XSLT is to process technical documents
with indexes and glossaries in 50+ national languages and sometimes
ideosyncratic editorial rules for collation).
A textbook example of why collation has to be custom is Simplified
Chinese--it's collated based on the Pin-Yin transliteration of the
ideographic characters. For example, the character for "horse" is
pronounced "ma" in Mandarin, so it would sort under "M" in the index.
The problem is that there is no single authority for the transliteration
of all characters. Many characters have alternative pronounciations,
such as "b" or "v" depending on local usage. So there cannot be a single
authoritative collation rule for Simplified Chinese--it will always vary
based on the local transliteration practice or, sometimes, the opinion
of one person or another. You can see this in the Unicode "unihan"
database, which provides lots of information about the Chinese
ideographs, including Mandarin and Cantonese transliterations. Many
characters have at least two Mandarin transliterations.
I don't use MSXML, but I'm guessing that it relies entirely on Windows'
built-in regional settings for collation. That's simply not good enough,
at least for technical and academic documents.
W. Eliot Kimber
9390 Research Blvd, #410
Austin, TX 78759