Subject: Re: [xsl] Safe-guarding codepoints-to-string() from wrong input From: Florent Georges <darkman_spam@xxxxxxxx> Date: Wed, 20 Dec 2006 16:16:26 +0100 (CET) |
Abel Braaksma wrote: Hi > I know that control characters are not allowed and throw > an "Invalid XML character" error. Also, when adding very > wide numbers (like "1234567") give a plural of the same > error (Im not sure why). Some characters (I believe these > are the ones that are not assigned in Unicode) result in > an empty string (like "12345"). > Is there a robust way of allowing/disallowing a set of > codepoints (other than making one huge lookup list)? Technically, it is not complex. Just define a function my:codepoints-to-string() that makes the needed checks and do what you want when encoutering an invalid codepoint. I think the most difficult part is identifying which codepoints are valid. You can use the following from the XML recommendation as starting point: /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] Document authors are encouraged to avoid "compatibility characters", as defined in section 6.8 of [Unicode] (see also D21 in section 3.6 of [Unicode3]). The characters defined in the following ranges are also discouraged. They are either control characters or permanently undefined Unicode characters: [#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF], [#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF], [#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF], [#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF], [#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF], [#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF], [#x10FFFE-#x10FFFF]. When you have identified the (in)valid codepoints, you will have to choose what to do with (in)valid codepoints. For example, calling codepoints-to-string() for valid codepoints, and return the empty sequence or the empty string for invalid one: <xsl:function name="my:is-in-range" as="xs:boolean"> <xsl:param name="value" as="xs:integer"/> <xsl:param name="down" as="xs:integer"/> <xsl:param name="up" as="xs:integer"/> <xsl:sequence select="$value ge $down and $value le $up"/> </xsl:function> <xsl:function name="my:is-valid-codepoint" as="xs:boolean"> <xsl:param name="cp" as="xs:integer"/> <xsl:sequence select=" $cp = (9, 10, 13) or my:is-in-range($cp, 32, 55295) or my:is-in-range($cp, 57344, 65533) or my:is-in-range($cp, 65636, 1114111)"/> </xsl:function> <xsl:function name="my:codepoint-to-string" as="xs:string?"> <xsl:param name="cp" as="xs:integer"/> <xsl:if test="my:is-valid-codepoint($cp)"> <xsl:sequence select="codepoints-to-string($cp)"/> </xsl:if> </xsl:function> or instead the following, depending on your needs: <xsl:function name="my:codepoints-to-string" as="xs:string"> <xsl:param name="cp" as="xs:integer*"/> <xsl:sequence select=" codepoints-to-string($cp[my:is-valid-codepoint(.)])"/> </xsl:function> Regards, --drkm ___________________________________________________________________________ Yahoo! Mail riinvente le mail ! Dicouvrez le nouveau Yahoo! Mail et son interface rivolutionnaire. http://fr.mail.yahoo.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Safe-guarding codepoints-, Abel Braaksma | Thread | Re: [xsl] Safe-guarding codepoints-, Abel Braaksma |
RE: [xsl] Positional grouping with , Michael Kay | Date | Re: [xsl] Safe-guarding codepoints-, Abel Braaksma |
Month |