Re: [xsl] [XSLT 2 or 3 - Diacritics] Removal

Subject: Re: [xsl] [XSLT 2 or 3 - Diacritics] Removal
From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 8 Aug 2022 13:43:13 -0000
On Mon, Aug 08, 2022 at 01:14:53PM -0000, Christophe Marchand cmarchand@xxxxxxxxxx scripsit:
> I want to translate all accented chars into their non-accented version :
> 
> i -> e
> K -> E
> g -> c
> 
> I can not remember the diacritic unicode block name to match them and use it
> in replace(2).
> 
> Does someone remember it ?

I think you mean the Unicode decomposition trick to get rid of all the
accents:

<xsl:sequence
      select="normalize-unicode($accents, 'NFD') => replace('\p{Mn}', '') => normalize-unicode('NFC')"
     />

(I think Mn is "Mark, nonspacing"; officially M is "Combining
Diacritical Marks".)

With the caveat that some things that look like accented characters are
real letters so far as the Unicode committee is concerned and this
doesn't work.  (E.g., A-ring, U+00C5 and U+00E5, keeps the ring.)


-- 
Graydon Saunders  | graydonish@xxxxxxxxx
^fs oferiode, pisses swa mfg.
-- Deor  ("That passed, so may this.")

Current Thread