Re: [xsl] How to split an RegEx into several lines for readability?

Subject: Re: [xsl] How to split an RegEx into several lines for readability?
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Wed, 02 May 2007 14:16:58 +0200
Dimitre Novatchev wrote:


<xsl:variable name="myregex" as="xs:string">
   (          <!-- grab everything -->
   "          <!-- start of a q. string -->
   [^"]*      <!-- zero or more non-quotes -->
   "          <!-- end of a q. string -->
   )          <!-- closing 'grab all' -->
</xsl:variable>



I think this is probably the most amazing and useful tip I got in this
thread -- fully deserves to be in the XSLT FAQ!

I have never before seen a string variable defined in this way --
probably because we do nod have an "x" modifier (and modifiers at all)
when generally defining variables.

And you don't need to, as you can see above :)


Btw, many languages define an object or type Regex, which is something like a precompiled regular expression. Unfortunately, the W3 committee never created something like xs:regex, which keeps us from defining precompiled regexes. If I understood Michael's comment correctly, the optimizer may or may not 'see' that the regular expression is static or dynamic. If it decides that it is dynamic, the method above will introduce quite a performance hit as Saxon will recompile the regex on each use.

If only I were capable of having an input XML document with elements of type xs:regex, the parser could (then) precompile and reuse them (but now I'm drifting...).


Also, I don't think I've ever seen before comments interspersed within the string contents of an xsl:variable.

Well, you should have a look at my code ;)
We do a lot of text-to-xml transform and complex regular expressions help us a lot, but are notoriously hard to read. Hence, about a year ago, I asked about the same question, and I summarized the cumulative answers here: http://www.biglist.com/lists/xsl-list/archives/200607/msg00733.html


Other ideas that came up by several people included:

* use a custom function that takes a string, removes your self-defined comments and returns a valid regex (there's another thread where I offered a regex-test by a regex, which may come in useful)
* use the AVT possibilities to introduce XPath comments: regex=".*{''(: comment here :)}[abc]" or regex=".*{()(: comment here :)}[abc]"
* various ways of concat/join plus interspersed XPath comments


Also, in that thread, Michael explained why there wasn't a similar construct as the '#' comment in Perl regexes.

In practice, in AVTs, I found the xsl:variable approach extremely useful not only for its ease of verbosity, but also because it reliefs me from doubling { and } characters, and escaping (mixed) quotes.

Cheers,
-- Abel Braaksma

Current Thread