RE: [xsl] Flattening characters to plain latin

Subject: RE: [xsl] Flattening characters to plain latin
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Sat, 17 Feb 2007 17:22:31 -0000
> My verdict: If the 'lt' of Michael was on purpose, I still 
> want to grant him the "Best Original Software Snippet Based 
> On Any XXX* Language" ;-)

I think the original problem wasn't especially well specified, and I was
well aware that retaining all the characters below 127 while losing those
above was a pretty crude cutoff. In the light of that, the decision whether
to keep or lose 127 itself is neither here nor there. Almost certainly a
better solution solution is to discard only the characters in particular
Unicode groups, which should be possible to achieve using replace() with
appropriately selected regular expressions. The basic idea I was trying to
propose was using normalize-unicode to translate into decomposed normal form
and then discarding modifier characters, and I think that's basically a
sound approach.

In fact a better solution might be

replace(normalize-unicode($in, 'NFKD'), '\P{Mn}', '')

but I'm sure that could be improved further.

Michael Kay
http://www.saxonica.com/

Current Thread