RE: [xsl] XSLT script to report Unicode characters and code blocks in file?

Subject: RE: [xsl] XSLT script to report Unicode characters and code blocks in file?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 29 May 2008 21:32:56 +0100
I wrote a transformation that uses unparsed-text() and regex processing to
create an XML version of the Unicode database; once you've got that, you can
easily look up what code block a particular character falls into because
it's part of the data for each character. (Well, most of the characters.
Some of the non-BMP entries share a single entry for a large group of
characters, which needs a bit of care).

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: David Sewell [mailto:dsewell@xxxxxxxxxxxx] 
> Sent: 29 May 2008 20:45
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] XSLT script to report Unicode characters and 
> code blocks in file?
> 
> I'm working on a simple XSLT 2.0 script to list all distinct 
> Unicode characters used in a file. That part of the script 
> takes very few lines, thanks to distinct-values(), 
> codepoints-to-string(), and string-to-codepoints().
> 
> However, I'd also like to group the output by code block:
> 
> http://www.fileformat.info/info/unicode/block/index.htm
> 
> Best way I can see to do it is to write a local function that 
> tests the codepoint value and uses lots and lots of 
> <xsl:when> case tests to determine which block the character 
> falls into. Not hard but a bit tedious. Has anyone invented 
> this wheel already?
> 
> DS
> 
> --
> David Sewell, Editorial and Technical Manager ROTUNDA, The 
> University of Virginia Press PO Box 801079, Charlottesville, 
> VA 22904-4318 USA
> Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
> Email: dsewell@xxxxxxxxxxxx   Tel: +1 434 924 9973
> Web: http://rotunda.upress.virginia.edu/

Current Thread