RE: [xsl] lookaheads in XSLT2 regexes

Subject: RE: [xsl] lookaheads in XSLT2 regexes
From: Liam R E Quin <liam@xxxxxx>
Date: Thu, 04 Mar 2010 00:59:44 -0500
On Wed, 2010-03-03 at 21:27 +0000, Michael Kay wrote:
> > On the subject of \b I'll note we do have \W and \w 
> 
> So we do, I overlooked that. And we define it a little differently from
> Perl:
> 
> [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] 
> 
> So for example "+" is regarded as part of a word, while "-" isn't. Which
> strikes me as totally useless, to be honest.

I agree.

We could fix that for XPath 2.1 I think.  I'm not sure what the most
useful fix would be, I admit.

The Perl definition of "alphanumeric" plus "_" would probably work for
\w, if one took alphnumeric to mean Letters|Numbers, \p{L}|\p{N},
and is coincidentally closer to what you get in Perl if you do
    use locale;
and your locale is (say) en_UK.UTF8, as it's then the same as
the POSIX fragment [[:alpha:][:digit:]_]

There are lots of things that could be added to regular expressions;
but \b is hard to emulate, useful, and also we seem to have a rather
odd \w.  If \w is there, I think \b was omitted by mistake.  Or that
\w was included by mistake!
 
Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org

Current Thread