|
Subject: Re: [xsl] Filtering, xslt 2.0 From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 2 Nov 2022 14:18:47 -0000 |
On Wed, Nov 02, 2022 at 02:10:09PM -0000, Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx scripsit:
> The second argument to tokenize() is a regular expression, so b, *b means
> bcomma followed by zero or more spacesb.
>
> I would write it as b,\s*b, which is clearer and handles all white space
> (space, tab, etc.).
This is true, though I would note that in general, the Unicode character
category,
tokenize($param,',\p{Zs}*')
can be safer. \s usually matches a space, a tab, a carriage return, a
line feed, or a form feed, but what the exact match is depends on the
regular expression implementation. Whereas you know what Zs,
"Separator, spaces", is and unlike \s it includes U+00A0 NO-BREAK SPACE.
Less of a concern with a param but potentially helpful with document
content when you mean "spaces between words" more than you mean "a
pre-Unicode general definition of white space".
--
Graydon Saunders | graydonish@xxxxxxxxx
CC&s oferC)ode, C0isses swC! mC&g.
-- Deor ("That passed, so may this.")
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Filtering, xslt 2.0, Dave Pawson dave.paw | Thread | Re: [xsl] Filtering, xslt 2.0, Liam R. E. Quin liam |
| Re: [xsl] Filtering, xslt 2.0, Dave Pawson dave.paw | Date | Re: [xsl] Filtering, xslt 2.0, David Carlisle d.p.c |
| Month |