[xsl] unicode normalisation

Subject: [xsl] unicode normalisation
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 21 Jun 2001 17:11:53 +0100
>  Other words : does XSLT already provide
> normalization of strings before comparison?

The parser will have ensured that all strings are expressed in unicode
(ie translating any file encoding that was specified in an <?xml
encoding="...."?> in the file) so you don't need to worry about
differences between say latin-1 and windows code pages.

However it does not do unicode character normalisation (eg replacing an
e followed by a combining acute character by the single e-acute
character)

If you want to do a test that considers e-acute equal to e followed by
combining-acute, then almost certainly your best bet would be to run the
input file through a separate unicode normalising program _before_
giving it to XSLT.

XSLT's string handling is in theory capable of implementing the unicode
normalisation algorithm, but it would be painful to write.
(We could spoil Jeni's weekend by suggesting she writes an XSLT-native
implementation for her EXSLT collection, but that would be cruel..)

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread
  • [no subject]
    • Alex Genis - Thu, 21 Jun 2001 11:49:37 -0400
      • David Carlisle - Thu, 21 Jun 2001 17:11:53 +0100 <=