Re: [xsl] XSLT script to report Unicode characters and code blocks in file?

Subject: Re: [xsl] XSLT script to report Unicode characters and code blocks in file?
From: David Carlisle <davidc@xxxxxxxxx>
Date: Fri, 30 May 2008 12:47:03 +0100
Colin

> Yes. XML Schema (and hence XPath) regular expressions.

They don't help do they?

Take alpha U+0391.  The UCD says that is Lu so it matches \p(Lu) but that
just tells you it's a lower case letter, it doesn't tell you it's in the
block
      <block start="00370" end="003FF" name="Greek and Coptic"/>
does it? The code I pointed to in the message you replied to would take
an alpha, get its code point, and find the string "0039" as being the
first four digits of a five digit hex representation of the codepoint,
then find this block element in unicode.xml, and thus (for example) to
http://www.unicode.org/charts/PDF/U0370.pdf
which is the pdf file which has the alpha glyph example.

Actually regexp could help, you could take the block range information
and build a regexp that matches each block by generating teh required
charater range expressions, but I think it's more natural to do that as
an xpath query rather than forcing it through the regexp engine.

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Current Thread