Subject: RE: [xsl] XSLT script to report Unicode characters and code blocks in file? From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Fri, 30 May 2008 13:01:21 +0100 |
> Take alpha U+0391. The UCD says that is Lu so it matches > \p(Lu) but that just tells you it's a lower case letter, it > doesn't tell you it's in the block > <block start="00370" end="003FF" name="Greek and > Coptic"/> does it? That's true if by "UCD" you mean the UnicodeData.txt file. But that's only one of the files in the Unicode database; another file is blocks.txt which does contain the required information. Incidentally, the current version of blocks.txt does not exactly match the names of the blocks as defined in schema (and XPath) regular expressions, for example 0370..03FF was once "Greek" but is now "Greek and Coptic". The Schema WG is close to deciding that the Unicode names are definitive, which means that regular expressions become invalid when Unicode decides to change the names of the blocks... Michael Kay http://www.saxonica.com/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XSLT script to report Uni, David Carlisle | Thread | Re: [xsl] XSLT script to report Uni, Colin Paul Adams |
Re: [xsl] Know repeated values, Martin Honnen | Date | Re: [xsl] Know repeated values, Martin Honnen |
Month |