RE: [xsl] lookaheads in XSLT2 regexes

Subject: RE: [xsl] lookaheads in XSLT2 regexes
From: Liam R E Quin <liam@xxxxxx>
Date: Wed, 03 Mar 2010 16:09:38 -0500
On Tue, 2010-03-02 at 09:21 +0000, Michael Kay wrote:

> I would imagine there would also be raised eyebrows about including "_" in
> the set of "word" characters. That's something that only happens in geekdom.
> But in the past the principle has been "if Perl defines it well, do what
> Perl does, otherwise leave it out completely." In my view we've already
> copied too many of Perl's mistakes, like the strange rules on recognizing
> whether \12 is a back-reference to group 12 or a back-reference to group 1
> followed by a digit 2.

I don't remember what first introduced back-references beyond 9;
it might have been sed.

More recently Perl provides named capture buffers, instead of having to
use numbers, and also \g to get the back references --

\g{12}
\g{-1} # the last buffer
and with (?<sock> ....pattern.... ) ..... \g{sock}

.net and perl regexps are incompatible in what happens if you mix
the (...) and \1 with named buffers -- Perl counts both named and
unnamed buffers, and .net only counts unnamed ones.

On the subject of \b I'll note we do have \W and \w -- Perl at least
defines \b as a boundary between \W and \w.  It _is_ crazy that \b in
a character class represents backspace.  Perl also has \B to match at a
non-word boundary -- between \w and \w or between \W and \W.

Historically, the Unix vi editor used (uses) \< for matching \W\w (i.e.
the start of a "word") and \> for the end, \w\W, which always seemed a
little clearer to me, but for use with XML we need to stay away from
assigning meaning to < and > I think :-)

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org

Current Thread