Re: [xsl] xslt 2.0 regex

Subject: Re: [xsl] xslt 2.0 regex
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Sat, 17 Mar 2012 19:26:28 +0100
I don't think that \i and \c are equivalent to NameStartChar and NameChar.
-W

On 17 March 2012 19:13, Brandon Ibach <brandon.ibach@xxxxxxxxxxxxxxxxxxx>
wrote:
> On Sat, Mar 17, 2012 at 2:01 PM, davep <davep@xxxxxxxxxxxxx> wrote:
>> On 17/03/12 17:38, Brandon Ibach wrote:
>>> Ignoring that for the moment, though, Tony pointed out one consequence
>>> of this, but the bigger issue is that the "|" operator in regex is
>>> fairly low precedence, so you often need some parenthesis around the
>>> list of alternatives to get things right.  Your NameStartChar.re has a
>>> hex-char-ref-encoded "$" at the beginning, so that regex is actually
>>> "$[A-Z] | _ | [a-z] | ...", which means it will match "(a dollar sign
>>> followed by an upper-case English letter) or (an underscore) or (a
>>> lower-case English letter) or ...".
>>
>> Which is correct (less the missing :), that's what they call the
startChar?
>
> Not quite.  Notice the parens I put in the "translation".  The dollar
> sign is only matched when it is followed by an upper-case English
> letter.  A variable name starting with anything else won't match
> correctly.  You'd need parens around the list of alternatives to make
> sure the "|" only applies to the upper-case-letter pattern and not the
> dollar sign.  Since you re-use NameStartChar.re in NameChar.re,
> however, that won't work.  Moving the dollar sign to Name.re solves
> that problem.
>
>>> All that said, I got this to work by dropping the "$" from the start
>>> of NameStartChar.re and changing Name.re to:
>>>
>>> concat("\$(", $NameStartChar.re, ")(", $NameChar.re,")*")
>>
>> presumably because $ is within the a-z grouping....
>> Yes it works.
>>
>> It should be possible (and simpler) to use Tony's idea \i\c+  since : is a
>> valid variable name. Not a syntax I've used before I think?
>
> QName is actually defined (in more EBNC-like notation) as "(NCName
> ':')? NCName", which allows only one ":" under specific circumstances,
> so your best bet is probably something like this, based on Tony's
> later version:
>
> ([\i-[:]][\c-[:]]*:)?[\i-[:]][\c-[:]]*
>
> -Brandon :)

Current Thread