Re: [xsl] two regexp related questions

Subject: Re: [xsl] two regexp related questions
From: Julian Reschke <julian.reschke@xxxxxx>
Date: Thu, 19 May 2011 22:45:30 +0200
On 2011-05-19 22:24, Imsieke, Gerrit, le-tex wrote:


On 2011-05-19 21:16, Julian Reschke wrote:
On 2011-05-19 20:51, Brandon Ibach wrote:
For 2), if you're using the regex to both validate the input (making
sure it conforms to the required syntax) and parse/extract the
name/value pairs, you might be able to make the job easier by breaking
these two tasks apart. Use the regex as you have it now to validate
the input and then, if it matches, use a shorter regex that matches
just a single name/value pair with analyze-string to do the actual
processing.

-Brandon :)

That's more or less what I do know. But as long as the regex contains a repeating pattern, <xsl:matching-substring> will only be invoked once, and the regex-group function will only return the contents for the last match, right?

I think it depends on the implementation. I couldn't see anything in the spec about what regex-group(3) of ([a-z]+)=([a-z]+)(;([a-z]+)=([a-z]+))* should be. In Saxon, it's ';e=f' for your example, but in principle it could also be ';c=d'.

As Brandon pointed out, using analyze-string with a repeating pattern
that matches the entire string is not the best approach. There are more
natural approaches that work without recursion. I sketched two of them
below.
..

Wow, thanks for the feedback.


What I did not mention in my mail is that I simplified things; first of all tokenize() won't work, as the separator needs to take context into account (the right hand side can be a quoted string which can contain the ";").

Also, the syntax is slightly more complex; the first component differs from the other components.

What I'm trying to parse is an HTTP header field syntax, shared by header fields like Content-Type or Content-Disposition:

  value = name ( ";" param )*
  name = token
  param = token "=" (token | quoted-string)
  ...

(in IETF ABNF speak).

The actual code I currently have and which works is in

http://greenbytes.de/tech/tc2231/tc2231.xslt

to be applied to

http://greenbytes.de/tech/tc2231/tc2231.xml

I currently have one template for matching the whole expression, which delegates to another one for

( ";" param )*

which itself matches the first param, and then recurses. This probably can be simplified as in your "as" example.

Thanks for the feedback, Julian

Current Thread