RE: [xsl] XSLT script to report Unicode characters and code blocks in file?

Subject: RE: [xsl] XSLT script to report Unicode characters and code blocks in file?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 30 May 2008 13:01:21 +0100
> Take alpha U+0391.  The UCD says that is Lu so it matches 
> \p(Lu) but that just tells you it's a lower case letter, it 
> doesn't tell you it's in the block
>       <block start="00370" end="003FF" name="Greek and 
> Coptic"/> does it? 

That's true if by "UCD" you mean the UnicodeData.txt file. But that's only
one of the files in the Unicode database; another file is blocks.txt which
does contain the required information.

Incidentally, the current version of blocks.txt does not exactly match the
names of the blocks as defined in schema (and XPath) regular expressions,
for example 0370..03FF was once "Greek" but is now "Greek and Coptic". The
Schema WG is close to deciding that the Unicode names are definitive, which
means that regular expressions become invalid when Unicode decides to change
the names of the blocks...

Michael Kay
http://www.saxonica.com/

Current Thread