|
Subject: Re: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions From: John Besch <jbesch@xxxxxxx> Date: Mon, 12 Jun 2006 15:25:34 -0400 |
> How, for example, to use a useful syntax like
> matches(.,'\p{Script:Arabic}+') ?
>
>schema-2 says: http://www.w3.org/TR/xmlschema-2/#regexs
>
>[Definition:] [Unicode Database] groups code points into a number of
>blocks such as Basic Latin (i.e., ASCII), Latin-1 Supplement, Hangul
>Jamo, CJK Compatibility, etc. The set containing all characters that
>have block name X (with all white space stripped out), can be identified
>with a block escape \p{IsX}. The complement of this set is specified
>with the block escape \P{IsX}. ([\P{IsX}] = [^\p{IsX}]).
>...
>For example,
>the 7block escape7 for identifying the ASCII characters is \p{IsBasicLatin}.
>
>so that would be \p(IsArabic)
>
>David
I want to use the above construct to detect Japanese characters, and so I am using the
following xsl:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8" />
<xsl:template match="/text">
<xsl:for-each select="tokenize(.,'\s+')">
<word>
<xsl:attribute name="language">
<xsl:choose>
<xsl:when test="matches(.,'\p{IsCJKCompatibility}+')">Japanese</xsl:when>
<xsl:when test="matches(.,'\p{IsBasicLatin}+')">Latin</xsl:when>
<xsl:otherwise>Unknown</xsl:otherwise>
</xsl:choose>
</xsl:attribute>
</word>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
However, the Japanese characters in my input, which are encoded in UTF-8, come out flagged as Latin
or Unknown. What am I doing wrong? How do I get this to recognize the Japanese characters?
Thanks for any help you can offer.
John Besch
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| [xsl] Re: Schematron with XPath 2.0, David Sewell | Thread | RE: [xsl] XSLT 2.0 : Unicode hex no, Michael Kay |
| Re: [xsl] Transforming multiple XML, Gowri Ratakonda | Date | [xsl] Dynamic columns for xslt, Tham Tinh |
| Month |