Re: [xsl] Getting the Base Character of Character with Diacritic

Subject: Re: [xsl] Getting the Base Character of Character with Diacritic
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Tue, 19 Sep 2006 10:56:54 +0200
Michael Kay wrote:
Following up on suggestions from others, if NFKD is supported then the
following should work reasonably well for European languages:

replace(normalize-unicode($in, 'NFKD'), '[&#x0300;-&#x036F;]', '')
You are right. In a way, I thought that the Modifier characters (x02B0 - x02FF) could also be used for "modifying" a certain character (I mentioned the macron and circumflex in an earlier post as 0x02C9 and 0x02C6, but these were wrong). They do include macron, diaeresis, circumflex etc. but as I understand now, these are not used for "modifying/combining letters" but for "modifying spacing" (i.e: quotes etc), and as a result do not influence normalization.

-- Abel Braaksma

Current Thread