Re: [xsl] regular expressions in XSLT 2.0

Subject: Re: [xsl] regular expressions in XSLT 2.0
From: Brandon Ibach <brandon.ibach@xxxxxxxxxxxxxxxxxxx>
Date: Sun, 28 Aug 2011 17:32:37 -0400
This is a case of operator precedence (in a sense, anyway).  The
specification of regular expression syntax ([1], with modifications in
[2]) says:

regExp ::= branch ( '|' branch )*
branch ::= piece*
piece ::= atom quantifier?
atom ::= Char | charClass | ( '(' regExp ')' )
charClass ::= charClassEsc | charClassExpr | WildCardEsc | "^" | "$"

Thus, the "^" and "$" are each, in turn, a charClass, atom, piece and,
along with adjacent pieces, a branch, the whole of which is subject to
the alternation operator ("|").  So, your original expression matches
either 1) a string that starts with the part before the bar or 2) a
string that ends with the part after.

By putting the parentheses in, you've put the alternation expression
in sequence with the "^" and "$", so they both must match, along with
one of the alternatives inside the parens.

I'd say the tool you tried that gave a false for the first test was
either implementing a version of regular expressions with different
defined semantics or it was wrong.

-Brandon :)

[1] http://www.w3.org/TR/xmlschema-2/#regexs
[2] http://www.w3.org/TR/xpath-functions/#regex-syntax


On Sun, Aug 28, 2011 at 5:12 PM, Wolfhart Totschnig
<wolfhart@xxxxxxxxxxxxx> wrote:
> Hello,
>
> I have a question about regular expressions in XSLT 2.0. I noticed that
>
> test="matches('40e','^\d{1,3}|[ivxl]{1,7}$')"
>
> will be evaluated as true, which puzzles me, since I thought it should be
> evaluated as false. (A regular expressions test page I found on the
internet
> (http://www.fileformat.info/tool/regex.htm) indeed evaluates the test as
> false.)
>
> When I add parentheses in the regular expression, i.e.,
>
> test="matches('40e','^(\d{1,3}|[ivxl]{1,7})$')"
>
> the test comes out false, however.
>
> So my question is this: Why does the test without the parentheses come out
> true? That is, how is the regular expression interpreted by the xslt engine
> such that "40e" is considered a match? And why to the parentheses make a
> difference? (I thought the parentheses would be redundant in this case.) Or
> is this maybe an issue specific to the xslt engine I use (Saxon9he)?
>
> Thanks in advance for your help!
> Wolfhart

Current Thread