RE: [xsl] How do I not ignore whitespace when sorting?

Subject: RE: [xsl] How do I not ignore whitespace when sorting?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Sun, 18 Dec 2005 16:49:16 -0000
You could try using the Unicode codepoint collation

<xsl:sort...
collation="http://www.w3.org/2005/xpath-functions/collation/codepoint"/>

With Saxon 8 you can map a collation URI to any Java collator class, for
example you can set up your own RuleBasedCollator. From your example,
however, I'm not convinced that a collation you design yourself is going to
work better, in general, than the standard English-language locale collation
supplied by the Java VM: you might get better results for some specific
cases, but I think you're unlikely to get better results across the board.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Mark Wilson [mailto:drmark@xxxxxxxxxxxxxxx] 
> Sent: 18 December 2005 16:08
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] How do I not ignore whitespace when sorting?
> 
> Thanks for a quick response Michael. I'm using Saxon 8. Will 
> look into 
> translate().
> 
> Mark
> 
> ----- Original Message ----- 
> From: "Michael Kay" <mike@xxxxxxxxxxxx>
> To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
> Sent: Sunday, December 18, 2005 10:37 AM
> Subject: RE: [xsl] How do I not ignore whitespace when sorting?
> 
> 
> > Collating sequences in XSLT 1.0 are entirely 
> implementation-defined. 
> > There's
> > a strong suggestion in the spec that the system should 
> choose a collation
> > appropriate to the natural language in use. Conventions for 
> handling 
> > spaces
> > and punctuation within the strings to be sorted are 
> notoriously variable,
> > but ignoring spaces is quite common, and certainly not 
> incorrect: if I 
> > look
> > up "ad hoc" in a dictionary, I expect to find it between 
> "adhesive" and
> > "adieu".
> >
> > In 2.0 there's a more formal mechanism for identifying the 
> collation you
> > want to be used, but it's still essentially 
> implementation-defined what
> > collations are available with a particular product.
> >
> > You might find that a pragmatic solution is to use 
> translate() to modify 
> > the
> > characters in the string before sorting.
> >
> > Michael Kay
> > http://www.saxonica.com/
> >
> >> -----Original Message-----
> >> From: Mark Wilson [mailto:drmark@xxxxxxxxxxxxxxx]
> >> Sent: 18 December 2005 15:29
> >> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> >> Subject: [xsl] How do I not ignore whitespace when sorting?
> >>
> >> Thanks for all the help with my previous question. This is my
> >> first week
> >> with XSLT, so please forgive my not knowing where to look. I
> >> have tracked
> >> down many comments on sorting, but failed to find one to answer the
> >> following.
> >>
> >> I am trying to sort a library subject catalog. I finally
> >> realized that the
> >> default <xsl:sort> was ignoring whitespace within the value
> >> of the element.
> >> The order I get is:
> >> Catalogs. Austria
> >> Catalogs -- Zepplins
> >>
> >> when I what I want is:
> >> Catalogs -- Zepplins
> >> Catalogs. Austria.
> >> ( for the *space* at the end of 'catalogs ' to file before
> >> the *period* at
> >> the end of 'catalogs.'
> >>
> >> I was able to demonstrate to myself that white space is
> >> ignored when I
> >> inserted a *space within* the word 'cat alogs' and still got
> >> this result:
> >> Catalogs. Austria
> >> Cat alogs -- Zepplins
> >>
> >> Since this means that everyone would have serious problems
> >> sorting things
> >> like "Art tissue" and Artisians", I assume there must be a way.
> >> Can I, and if so how do I, tell <xsl:sort> not to ignore
> >> whitespace? Do I
> >> need a special external colation?
> >> Thanks,
> >> Mark

Current Thread