Subject: Re: [xsl] [XSLT 2 or 3 - Diacritics] Removal From: "Christophe Marchand cmarchand@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Mon, 8 Aug 2022 13:49:36 -0000 |
Thanks Graydon ! That's what I was looking for. Christophe
On Mon, Aug 08, 2022 at 01:14:53PM -0000, Christophe Marchand cmarchand@xxxxxxxxxx scripsit:I want to translate all accented chars into their non-accented version :
C) -> e C -> E C' -> c
I can not remember the diacritic unicode block name to match them and use it in replace(2).
Does someone remember it ?I think you mean the Unicode decomposition trick to get rid of all the accents:
<xsl:sequence select="normalize-unicode($accents, 'NFD') => replace('\p{Mn}', '') => normalize-unicode('NFC')" />
(I think Mn is "Mark, nonspacing"; officially M is "Combining Diacritical Marks".)
With the caveat that some things that look like accented characters are real letters so far as the Unicode committee is concerned and this doesn't work. (E.g., A-ring, U+00C5 and U+00E5, keeps the ring.)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] [XSLT 2 or 3 - Diacritics, Graydon graydon@xxxx | Thread | Re: [xsl] [XSLT 2 or 3 - Diacritics, Dave Pawson dave.paw |
Re: [xsl] [XSLT 2 or 3 - Diacritics, Graydon graydon@xxxx | Date | Re: [xsl] [XSLT 2 or 3 - Diacritics, Dave Pawson dave.paw |
Month |