Re: [xsl] [XSLT 2 or 3 - Diacritics] Removal

Subject: Re: [xsl] [XSLT 2 or 3 - Diacritics] Removal
From: "Christophe Marchand cmarchand@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 8 Aug 2022 13:49:36 -0000
Thanks Graydon ! That's what I was looking for.
Christophe

Le 08/08/2022 C 15:43, Graydon graydon@xxxxxxxxx a C)critB :
On Mon, Aug 08, 2022 at 01:14:53PM -0000, Christophe Marchand cmarchand@xxxxxxxxxx scripsit:
I want to translate all accented chars into their non-accented version :

C) -> e
C -> E
C' -> c

I can not remember the diacritic unicode block name to match them and use it
in replace(2).

Does someone remember it ?
I think you mean the Unicode decomposition trick to get rid of all the
accents:

<xsl:sequence
       select="normalize-unicode($accents, 'NFD') => replace('\p{Mn}', '') => normalize-unicode('NFC')"
      />

(I think Mn is "Mark, nonspacing"; officially M is "Combining
Diacritical Marks".)

With the caveat that some things that look like accented characters are
real letters so far as the Unicode committee is concerned and this
doesn't work.  (E.g., A-ring, U+00C5 and U+00E5, keeps the ring.)

Current Thread