RE: [xsl] Generate identifier

Subject: RE: [xsl] Generate identifier
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 29 Dec 2009 21:10:13 -0000
> Now, I have to build a name with only containing [A-Za-z0-9] only.
> My problem is that I often see characters with modifiers like
> 00E0 ` LATIN SMALL LETTER A WITH GRAVE
> 00E1 a LATIN SMALL LETTER A WITH ACUTE
> 00E2 b LATIN SMALL LETTER A WITH CIRCUMFLEX
> 00E3 c LATIN SMALL LETTER A WITH TILDE
> 00E4 d LATIN SMALL LETTER A WITH DIAERESIS ...
>
> My questions:
>   is it acceptable, from the perspective of a western
> language, to replace those characters with a character
> without modifier;
>   is there a way to do this in xslt;

You can use normalize-unicode($input, 'NFD') to convert the string to
decomposed normal form; the diacritics will then be present as separate
characters, which you can detect and remove using a regular expression -
probably the same regex that removes other unwanted characters.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay

Current Thread