Re: [xsl] Filtering, xslt 2.0

Subject: Re: [xsl] Filtering, xslt 2.0
From: "Liam R. E. Quin liam@xxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 2 Nov 2022 16:34:00 -0000
[a brief somewhat pedantic side-track]

On Wed, 2022-11-02 at 14:19 +0000, Graydon graydon@xxxxxxxxx wrote:
> On Wed, Nov 02, 2022 at 02:10:09PM -0000, Eliot Kimber
> eliot.kimber@xxxxxxxxxxxxxxx scripsit:
> > The second argument to tokenize() is a regular expression, so b, *b
> > means
> > bcomma followed by zero or more spacesb.
> >
> > I would write it as b,\s*b, which is clearer and handles all white
> > space
> > (space, tab, etc.).
>
> This is true, though I would note that in general, the Unicode
> character
> category,
>
> tokenize($param,',\p{Zs}*')
>
> can be safer. \s usually matches a space, a tab, a carriage return, a
> line feed, or a form feed, but what the exact match is depends on the
> regular expression implementation.B 


For XSLT 2 and later it's defined as equivalent to the character class
[&#x20\t\n\r] by XML Schema so there should not be any variation.

Unicode properties., however, are defined by the Unicode Consortium and
can vary over time - usually by additions.

(actually XSD omits the "&" but i think we can safely say that's a typo
and i seem to remember there may be an erratum about it.

>  Whereas you know what Zs,
> "Separator, spaces", is and unlike \s it includes U+00A0 NO-BREAK
> SPACE.

That's true. Well, the second part is true; the first part strictly
speaking requires checking the Unicode version in use by the
implementation and then looking up the corresponding information.


But i doubt many people find \p{Zs} clearer than \s, and \s is likely
fine for this usage :)

Perl allows \p{Space} - see perldoc perluniprops - and for Zs,
\p{Space_Separator}, as well as \p{Zs}. I wish XML Schema had included
the longer names, although hard-wiring that much English makes
internationalization people nervous.

liam

--
Liam Quin,B https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations: B http://www.fromoldbooks.org

Current Thread