Subject: Re: [xsl] Filtering, xslt 2.0 From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 2 Nov 2022 14:18:47 -0000 |
On Wed, Nov 02, 2022 at 02:10:09PM -0000, Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx scripsit: > The second argument to tokenize() is a regular expression, so b, *b means > bcomma followed by zero or more spacesb. > > I would write it as b,\s*b, which is clearer and handles all white space > (space, tab, etc.). This is true, though I would note that in general, the Unicode character category, tokenize($param,',\p{Zs}*') can be safer. \s usually matches a space, a tab, a carriage return, a line feed, or a form feed, but what the exact match is depends on the regular expression implementation. Whereas you know what Zs, "Separator, spaces", is and unlike \s it includes U+00A0 NO-BREAK SPACE. Less of a concern with a param but potentially helpful with document content when you mean "spaces between words" more than you mean "a pre-Unicode general definition of white space". -- Graydon Saunders | graydonish@xxxxxxxxx CC&s oferC)ode, C0isses swC! mC&g. -- Deor ("That passed, so may this.")
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Filtering, xslt 2.0, Dave Pawson dave.paw | Thread | Re: [xsl] Filtering, xslt 2.0, Liam R. E. Quin liam |
Re: [xsl] Filtering, xslt 2.0, Dave Pawson dave.paw | Date | Re: [xsl] Filtering, xslt 2.0, David Carlisle d.p.c |
Month |