Re: [xsl] Filtering, xslt 2.0

Subject: Re: [xsl] Filtering, xslt 2.0
From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 2 Nov 2022 14:18:47 -0000
On Wed, Nov 02, 2022 at 02:10:09PM -0000, Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx scripsit:
> The second argument to tokenize() is a regular expression, so b, *b means
> bcomma followed by zero or more spacesb.
>
> I would write it as b,\s*b, which is clearer and handles all white space
> (space, tab, etc.).

This is true, though I would note that in general, the Unicode character
category,

tokenize($param,',\p{Zs}*')

can be safer. \s usually matches a space, a tab, a carriage return, a
line feed, or a form feed, but what the exact match is depends on the
regular expression implementation.  Whereas you know what Zs,
"Separator, spaces", is and unlike \s it includes U+00A0 NO-BREAK SPACE.

Less of a concern with a param but potentially helpful with document
content when you mean "spaces between words" more than you mean "a
pre-Unicode general definition of white space".

-- 
Graydon Saunders  | graydonish@xxxxxxxxx
CC&s oferC)ode, C0isses swC! mC&g.
-- Deor  ("That passed, so may this.")

Current Thread