[xsl] Why does '#' start a comment in regular expressions with 'x' modifier flag? How can I match '#'?

Subject: [xsl] Why does '#' start a comment in regular expressions with 'x' modifier flag? How can I match '#'?
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Thu, 21 Dec 2006 15:30:16 +0100
Dear List,

This struck me as real odd, and I couldn't find a good reference about it explaining me why it is as it is. On a side note, about a year ago, I had a discussion on this very same list about comments inside regular expressions. "We" came to the conclusion that 'yet another comment construct, 'next to the xpath comment and the xml comment', was not desirable. At least that's what I thought.

Now I found out that, at least with Saxon (not sure of others), the '#' character is considered a comment delimiter when the 'x' flag (ignore whitespace) is ignored. Furthermore, it does not appear possible to escape the '#' character. Examples:

<xsl:analyze-string select="'#test'" regex="#" flags="">
---> result: MATCH (for '#')

<xsl:analyze-string select="'#test'" regex="#" flags="x">
---> result: ERROR: "The regular expression must not be one that matches a zero-length string (?)


<xsl:analyze-string select="'#test'" regex="test # comment" flags="x">
---> result: MATCH (for 'test')

<xsl:analyze-string select="'#test'" regex="[#]test # comment" flags="x">
---> result: ERROR: "The regular expression must not be one that matches a zero-length string (?)


<xsl:analyze-string select="'#test'" regex="\#test # comment" flags="x">
---> result: ERROR: "Invalid escape sequence

<xsl:analyze-string select="'#test'" regex="&#x23;test # comment" flags="x">
---> result: ERROR: "The regular expression must not be one that matches a zero-length string (?)



My application heavily relies on regular expressions and I create them with a lot of reading whitespace. Now that I know that '#' can be used as a comment, I may very well make it of use (though the source color won't see it). However, it have absolutely no idea as how to match the '#' string, and I really have to match the '#' string :S (it is a comment delimiter of some textual input data and must be removed).


Is this a part of XSLT spec I missed? Is it a bug in Saxon? Does anybody know of a workaround (with the 'x' modifier)?

Any ideas are welcome,

Cheers,
Abel

Current Thread