Re: [xsl] Sorting Upper-Case first. Microsoft bug?

Subject: Re: [xsl] Sorting Upper-Case first. Microsoft bug?
From: Markus Abt <abt@xxxxxxxx>
Date: Wed, 6 Aug 2003 21:52:04 +0200
Hello Stan,

Stan Devitt wrote:
>
>I apologize for yet another message on lexicographic sorting but
>in light of the considerable confusion exibited on this issue I'd like 
>to see
>three points emphasised.

It seems that I have a completely different sigth on these topics.

>
>1.  Lexicograpahic  is important precisely because it is so well defined 
>(and 
>because of this I suspect the spec writers really meant it when the 
>wrote it in. )
>It  provides an easy to check reference implementation that is 99% usable.

XSLT was made to transform documents in the first place.
Typical problems in this field are "how can I sort my index?" or
"how can I sort my glossary?".

A lexicographically sorting in your sense is not (=0%) useful here:
-  I don't want the group headers "A" "a" "B" "b" in my index.
-  I don't want "XSLT" before all or after all "Xs..." words.
-  I don't want "eXtensible" before "eat" or after "eye".

>
>2.  The notion of "lexicographic sorting" in the "culturally correct" 
>manner is  also valid,
>but it falls short of  implementing all of UTR 10.   The only  "cultural 
>choice" you have in a
>lexicograpahic sort  is in deciding on a total order of the symbols of 
>your  alphabet.
>After that, everything else is determined.  

Sorting by UTR 10 doesn't mean "sort undetermined" or "sort randomly".
There are exact rules, some of which are cultural choices, most not.
UTR 10 provides sorting in more than one levels. Look at UTR 10,
section 4 to see the algorithm.

Look at the proposal http://www.unicode.org/reports/tr10/tr10-10.html
for some great examples (in the first chapter).

On the other hand, how should one define a universal total order
of all Unicode symbols, to achieve a senseful lexicographic sorting?
Is "ä" smaller or greater or equal or unrelated to "a"?
How are the greek "alpha" or the hebrew "aleph" related to "a"?

Lexicographic sorting of Unicode strings is not useful for anything
practical I can think about.

>
>3.  Placing selected  "words" out of lexicographic order (however well 
>intended)
>clearly violates the lexicographic constraint of the spec and is in 
>error as the spec
>is currently worded.

Which selected words do you mean here?

>
>As a follow on action,  I'd like to see the spec writers clarify (in the 
>spec)
>that they really  do mean lexicographic, and perhaps augment the list of 
>available sorts
>by a  "pseudo-lexicographical" or "word" based sort in order to capture
>what actually got implemented and which is important for its own reasons
>but is much less well defined.

I would love to see the spec writers eliminate the single word
"lexicographic" from the spec.

Interestingly, the list of available sorts does NOT include "lexicographic".
The sort method we are talking about is named "text".
I would not want "text" to be replaced by "word", as this is misleading.
I would also not want "text" to be replaced by "pseudo-lexicographical"
because I can't spell it without at least 5 typos.

>
>Stan Devitt

Just my point of view,
Markus

PS: The definition David found at http://mathworld.wolfram.com/LexicographicOrder.html
contains the funny sentence:
"Lexicographic order is sometimes called dictionary order."

__________________________
Markus Abt
Comet Computer GmbH
http://www.comet.de


>
>Markus Abt wrote:
>
>>David,
>>
>>It seems to me that the XSLT specification wants lexicographic ordering in the
>>culturally correct manner.
>>Mabye this is a contradiction, in this case I would regard this an error in the XSLT spec.
>>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread