Subject: Re: [xsl] Unicode character blocks in strings From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx> Date: Tue, 26 May 2009 09:21:23 -0400 |
I have a string containing a mix of Chinese and Latin characters, eg *|.Z'J%R+,H1N1,y7PZL. I wish to return a nodeset containing the following kind of structure:
*|.Z'J%R+, H1N1 ,y7PZL
Where H1N1 falls into the BasicLatin unicode character block and the other two strings can be categorized as CJKUnifiedIdeographs.
Given http://en.wikipedia.org/wiki/Basic_Latin_unicode_block defines the characters up to the tilde, this can be done with a character range.
Can anyone suggest the cleanest way to do this using XSLT 2?
I like Michael's and David's suggestion better to use Unicde classes, but below is what I threw together quickly.
T:\ftemp>type tom.xml <doc>?????H1N1??</doc> T:\ftemp>call xslt2 tom.xml tom.xsl <?xml version="1.0" encoding="UTF-8"?><doc><other>?????</other><latin>H1N1</latin><other>??</othe r></doc> T:\ftemp>type tom.xsl <?xml version="1.0" encoding="US-ASCII"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="text()" priority="1"> <xsl:analyze-string select="." regex="[!-~]+"> <xsl:matching-substring> <latin><xsl:value-of select="."/></latin> </xsl:matching-substring> <xsl:non-matching-substring> <other><xsl:value-of select="."/></other> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template>
<xsl:template match="@*|node()"><!--identity for all other nodes--> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template>
-- XSLT/XSL-FO/XQuery hands-on training - Los Angeles, USA 2009-06-08 Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/ Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video Video lesson: http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18 Video overview: http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18 G. Ken Holman mailto:gkholman@xxxxxxxxxxxxxxxxxxxx Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/s/bc Legal business disclaimers: http://www.CraneSoftwrights.com/legal
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Unicode character blocks , David Carlisle | Thread | AW: Re: [xsl] match="*:style", Merico Raffaele |
Re: [xsl] Unicode character blocks , David Carlisle | Date | [xsl] Flat to Structured: Handling , Eliot Kimber |
Month |