Re: [xsl] Filtering, xslt 2.0

Subject: Re: [xsl] Filtering, xslt 2.0
From: "C. M. Sperberg-McQueen cmsmcq@xxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 2 Nov 2022 17:50:43 -0000
"Liam R. E. Quin liam@xxxxxxxxxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> writes:

> [a brief somewhat pedantic side-track]

[And .. a brief thoroughly pedantic side-track from the side-track:]

> On Wed, 2022-11-02 at 14:19 +0000, Graydon graydon@xxxxxxxxx wrote:

>> ... This is true, though I would note that in general, the Unicode
>> character category,
>>
>> tokenize($param,',\p{Zs}*')
>>
>> can be safer. \s usually matches a space, a tab, a carriage return, a
>> line feed, or a form feed, but what the exact match is depends on the
>> regular expression implementation.B 
>
>
> For XSLT 2 and later it's defined as equivalent to the character class
> [&#x20\t\n\r] by XML Schema so there should not be any variation.
>
> Unicode properties., however, are defined by the Unicode Consortium and
> can vary over time - usually by additions.
>
> (actually XSD omits the "&" but i think we can safely say that's a typo
> and i seem to remember there may be an erratum about it.

For what it's worth, not a typo.  The XSD spec uses hash mark + 'x' +
hexadecimal number to refer to Unicode code points.  This is explained
in a note in section 4.3.6:

        Note: The notation #xA used here (and elsewhere in this
        specification) represents the Universal Character Set (UCS) code
        point hexadecimal A (line feed), which is denoted by U+000A.
        This notation is to be distinguished from &#xA;, which is the
        XML character reference to that same UCS code point.


--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Current Thread