Subject: Re: [xsl] [XSLT 2 or 3 - Diacritics] Removal From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Mon, 8 Aug 2022 13:43:13 -0000 |
On Mon, Aug 08, 2022 at 01:14:53PM -0000, Christophe Marchand cmarchand@xxxxxxxxxx scripsit: > I want to translate all accented chars into their non-accented version : > > i -> e > K -> E > g -> c > > I can not remember the diacritic unicode block name to match them and use it > in replace(2). > > Does someone remember it ? I think you mean the Unicode decomposition trick to get rid of all the accents: <xsl:sequence select="normalize-unicode($accents, 'NFD') => replace('\p{Mn}', '') => normalize-unicode('NFC')" /> (I think Mn is "Mark, nonspacing"; officially M is "Combining Diacritical Marks".) With the caveat that some things that look like accented characters are real letters so far as the Unicode committee is concerned and this doesn't work. (E.g., A-ring, U+00C5 and U+00E5, keeps the ring.) -- Graydon Saunders | graydonish@xxxxxxxxx ^fs oferiode, pisses swa mfg. -- Deor ("That passed, so may this.")
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] [XSLT 2 or 3 - Diacritics, Christophe Marchand | Thread | Re: [xsl] [XSLT 2 or 3 - Diacritics, Christophe Marchand |
Re: [xsl] [XSLT 2 or 3 - Diacritics, Christophe Marchand | Date | Re: [xsl] [XSLT 2 or 3 - Diacritics, Christophe Marchand |
Month |