|
Subject: Re: [xsl] Safe-guarding codepoints-to-string() from wrong input From: Florent Georges <darkman_spam@xxxxxxxx> Date: Wed, 20 Dec 2006 16:16:26 +0100 (CET) |
Abel Braaksma wrote:
Hi
> I know that control characters are not allowed and throw
> an "Invalid XML character" error. Also, when adding very
> wide numbers (like "1234567") give a plural of the same
> error (Im not sure why). Some characters (I believe these
> are the ones that are not assigned in Unicode) result in
> an empty string (like "12345").
> Is there a robust way of allowing/disallowing a set of
> codepoints (other than making one huge lookup list)?
Technically, it is not complex. Just define a function
my:codepoints-to-string() that makes the needed checks and
do what you want when encoutering an invalid codepoint. I
think the most difficult part is identifying which
codepoints are valid. You can use the following from the
XML recommendation as starting point:
/* any Unicode character, excluding the surrogate
blocks, FFFE, and FFFF. */
[2] Char ::= #x9
| #xA
| #xD
| [#x20-#xD7FF]
| [#xE000-#xFFFD]
| [#x10000-#x10FFFF]
Document authors are encouraged to avoid "compatibility
characters", as defined in section 6.8 of [Unicode] (see
also D21 in section 3.6 of [Unicode3]). The characters
defined in the following ranges are also
discouraged. They are either control characters or
permanently undefined Unicode characters:
[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
[#x10FFFE-#x10FFFF].
When you have identified the (in)valid codepoints, you
will have to choose what to do with (in)valid codepoints.
For example, calling codepoints-to-string() for valid
codepoints, and return the empty sequence or the empty
string for invalid one:
<xsl:function name="my:is-in-range" as="xs:boolean">
<xsl:param name="value" as="xs:integer"/>
<xsl:param name="down" as="xs:integer"/>
<xsl:param name="up" as="xs:integer"/>
<xsl:sequence select="$value ge $down and $value le $up"/>
</xsl:function>
<xsl:function name="my:is-valid-codepoint" as="xs:boolean">
<xsl:param name="cp" as="xs:integer"/>
<xsl:sequence select="
$cp = (9, 10, 13)
or my:is-in-range($cp, 32, 55295)
or my:is-in-range($cp, 57344, 65533)
or my:is-in-range($cp, 65636, 1114111)"/>
</xsl:function>
<xsl:function name="my:codepoint-to-string" as="xs:string?">
<xsl:param name="cp" as="xs:integer"/>
<xsl:if test="my:is-valid-codepoint($cp)">
<xsl:sequence select="codepoints-to-string($cp)"/>
</xsl:if>
</xsl:function>
or instead the following, depending on your needs:
<xsl:function name="my:codepoints-to-string" as="xs:string">
<xsl:param name="cp" as="xs:integer*"/>
<xsl:sequence select="
codepoints-to-string($cp[my:is-valid-codepoint(.)])"/>
</xsl:function>
Regards,
--drkm
___________________________________________________________________________
Yahoo! Mail riinvente le mail ! Dicouvrez le nouveau Yahoo! Mail et son interface rivolutionnaire.
http://fr.mail.yahoo.com
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Safe-guarding codepoints-, Abel Braaksma | Thread | Re: [xsl] Safe-guarding codepoints-, Abel Braaksma |
| RE: [xsl] Positional grouping with , Michael Kay | Date | Re: [xsl] Safe-guarding codepoints-, Abel Braaksma |
| Month |