Re: [xsl] Use the xml:lang attribute to set the collation?

Subject: Re: [xsl] Use the xml:lang attribute to set the collation?
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Mon, 07 Jan 2013 15:44:01 +0000
Roger, have you read up on this subject? It's very thoroughly covered in my book (XSLT 2.0 and XPath 2.0 Programmer's Reference 4th edition): see "collations" in the index, and especially pages 459 et seq on xsl:sort. And of course in many other places. I don't think the readers of this list necessarily want to follow every small step in your learning curve.

The choice of collation is made in the stylesheet (or other program), it is NOT a property of the data. There are various reasons for that decision, the main one being that when you publish a phone book, it's the users of the phone book whose requirements you are concerned with, not the nationality of the people whose names are listed in the book. So xml:lang in the data makes no difference. But a lang attribute on xsl:sort does make a difference.

To take a simple example where the choice of collation makes a difference,

<xsl:value-of select="'a' eq 'A'" default-collation=""http://saxon.sf.net/collation?ignore-case=yes"/>

will give different results from

<xsl:value-of select="'a' eq 'A'" default-collation=""http://saxon.sf.net/collation?ignore-case=no"/>

Choosing a collation based on language alone will not usually affect the result of the '=' operator, only '<' and '>', because the language-based rules are mainly designed to influence sort behaviour, and for good sorting behaviour you usually want to treat all strings as distinct.

Michael Kay
Saxonica

On 07/01/2013 15:28, Costello, Roger L. wrote:
Hi Folks,

Michael Kay wrote this response to a StackOverflow question [1]:

     Saxon's default collation is Unicode codepoint, which is fast
     but not smart. Setting lang="en" will immediately give you a
     smarter natural-language collation. There are then many
     options to refine it further.

QUESTIONS
1. Does Michael's response mean that, to set the collation, I can use the xml:lang attribute instead of the default-collation attribute?

2. Would you please give an example of a comparison where the result of the comparison is true when xml:lang="A" but false when xml:lang="B"? That is, what values would you place in here:

     <Test xml:lang="__">
         <xsl:value-of select=" '__' lt '__' " />
     </Test>

/Roger

[1] http://stackoverflow.com/questions/13052896/xslt-sort-edge-case-for-ascending-sort-by-element-name

Current Thread