Re: [xsl] Regex groups / was: Re: [xsl] Move leading/trailing spaces outside (XSLT 2.0)

Subject: Re: [xsl] Regex groups / was: Re: [xsl] Move leading/trailing spaces outside (XSLT 2.0)
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Tue, 06 Feb 2007 17:54:49 +0100
Yves Forkl wrote:

Though familiar with RegExp functions in various languages, I am not sure how this one works, so I wonder what happens when the condition of the first regex group cannot be satisfied. Supposing that the second matches, will I then have to refer to it as regex-group(1) or still as regex-group(2)? I.e., do the numbers index the groups defined in the regex or the substrings that actually matched one of the groups?



Quite simply (and not so in some other regex dialects) the number in your regex-group is guaranteed to match the content of what is matched or not matched by the opening parenthesis with the same index. Meaning: just count all "(" in your regex from the left, and you know what will be in the regex-group(x) where 'x' equals your count.


I.e., text is: "this is a text"
regex: (t(.+)(.*))

will put the 't' in regex-group(1), the 'his is a text' in regex-group(2) and nothing in regex-group(3) (because of the rules of greedy matching and backtracking, but that is another story). Note that this also includes situations where the branches of the match (by '|' character) would logically break this count. Not so in XSLT, i.e.:

regex: (t)|(.)

will put all 't' in regex-group(1) and all other chars in regex-group(2).

If you want to know whether a regex-group actually matched anything, you can check its string-length. The same rules for regex-group apply to $1, $2 etc, but you can only use them inside replace(), not inside analyze-string.

-- Abel Braaksma
  http://www.nuntia.nl

Current Thread