Re: collation

Subject: Re: collation
From: "James Clark" <jjc@xxxxxxxxxx>
Date: Wed, 18 Jun 1997 13:59:31 +0700
All the collation stuff in DSSSL is just ISO/IEC 9945-2 with a Scheme
syntax.   If you want to understand it, I would strongly recommend looking
at 9945-2.  At one point there were some examples somewhere on
ftp://dkuug.dk, but that site seems to be down at the moment.

One reason you need multiple level-sort-rules is for languages like French.
 Typically you sort first ignoring accents and then use accents to sort
words that compare the same when you ignore accents.  The forward/backward
business is also for French.  If you have two strings that compare equal
ignoring accents, then you search for the *last* character which differs
when you don't ignore accents.

Actually even with English it's useful to have multiple sorting levels.  
The sorting rules are designed to determine the ordering of any
non-identical strings.  One way to deal with non-alphabetic characters in a
string is to sort first ignoring the non-alphabetic characters and then to
use the non-alphabetic characters to order strings that have the same
alphabetic characters.  (This case is what the position keyword is for.) 
Also it's common to sort first ignoring case, and then to use case to order
strings that compare identical case-insensitively.

James


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread
  • collation
    • kendall shaw - Tue, 17 Jun 1997 20:18:28 -0400 (EDT)
      • <Possible follow-ups>
      • James Clark - Wed, 18 Jun 1997 02:57:37 -0400 (EDT) <=