|
Subject: Re: [xsl] Unicode character blocks in strings From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx> Date: Tue, 26 May 2009 09:21:23 -0400 |
I have a string containing a mix of Chinese and Latin characters, eg *|.Z'J%R+,H1N1,y7PZL. I wish to return a nodeset containing the following kind of structure:
*|.Z'J%R+, H1N1 ,y7PZL
Where H1N1 falls into the BasicLatin unicode character block and the other two strings can be categorized as CJKUnifiedIdeographs.
Given http://en.wikipedia.org/wiki/Basic_Latin_unicode_block defines the characters up to the tilde, this can be done with a character range.
Can anyone suggest the cleanest way to do this using XSLT 2?
I like Michael's and David's suggestion better to use Unicde classes, but below is what I threw together quickly.
T:\ftemp>type tom.xml <doc>?????H1N1??</doc> T:\ftemp>call xslt2 tom.xml tom.xsl <?xml version="1.0" encoding="UTF-8"?><doc><other>?????</other><latin>H1N1</latin><other>??</othe r></doc> T:\ftemp>type tom.xsl <?xml version="1.0" encoding="US-ASCII"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="text()" priority="1">
<xsl:analyze-string select="." regex="[!-~]+">
<xsl:matching-substring>
<latin><xsl:value-of select="."/></latin>
</xsl:matching-substring>
<xsl:non-matching-substring>
<other><xsl:value-of select="."/></other>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template><xsl:template match="@*|node()"><!--identity for all other nodes-->
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>-- XSLT/XSL-FO/XQuery hands-on training - Los Angeles, USA 2009-06-08 Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/ Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video Video lesson: http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18 Video overview: http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18 G. Ken Holman mailto:gkholman@xxxxxxxxxxxxxxxxxxxx Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/s/bc Legal business disclaimers: http://www.CraneSoftwrights.com/legal
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Unicode character blocks , David Carlisle | Thread | AW: Re: [xsl] match="*:style", Merico Raffaele |
| Re: [xsl] Unicode character blocks , David Carlisle | Date | [xsl] Flat to Structured: Handling , Eliot Kimber |
| Month |