Subject: Re: [xsl] Need an XPath expression which returns all xs:pattern elements containing a regex that permits an unbounded number of characters From: "C. M. SperbergMcQueen cmsmcq@xxxxxxxxxxxxxxxxx" <xsllistservice@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 5 Apr 2024 15:01:52 0000 
"Willem Van Lishout willemvanlishout@xxxxxxxxx" <xsllistservice@xxxxxxxxxxxxxxxxxxxxxx> writes: > Is this even possible, theoretically speaking? As soon as you start > using lookaheads, square brackets, and so on, your patterns will > likely fail. I don't think regex can parse regex. Good observation, quite correct. Since the language of regular expressions in XSD (and XPath) is context free, not a regular language, it cannot be recognized in full by a regular expression. There are, I think, two ways to deal with that fact: 1 In the general case, if we limit the maximum nesting depth of a contextfree language to some finite number, the language becomes regular. So we cannot write a regular expression to match all and only the legal regular expressions of XSD or XPath, but we *can* write one that recognizes all and only the ones with at most one level of nesting for parentheses or square brackets. We can also write one that recognizes all and only the expressions with at most two, or three, or any positive n, levels of nesting. (The result is not guaranteed to be pretty.) For any finite set of inputs, there will be some maximum nesting level observed; if you have written your regular expression to handle nesting that deep, your regex will produce neither false positives nor false negatives for that input. How comforting this is depends on how costly a false positive or negative would be, and how costly it would be to parse the regular expressions in question using something like ixml. (Hmm. Good idea for a sample ixml grammar!) 2 In this case, we can observe that the requirement is only to detect patterns which allow strings of unbounded length, and thus regular expressions which use quantifiers like *, +, or {n,} on any base expression. We can assume that all inputs are syntactically well formed, so we do not need to distinguish wellformed from illformed expressions. Nesting of parentheses does not matter and can be ignored. Nesting of square brackets turns out also not to matter, since the characters *, +, {, and } are effectively escaped at every possible nesting level in a character class expression, and once the first right bracket is encountered, no character other than right bracket can be encountered until all open square bracket pairs have been closed. So: no, theoretically speaking, it's not possible to recognize regular expressions with regular expressions. But practically speaking, *for this problem* (a) an approximation is possible, and in fact better than that (b) a correct solution is possible, because of patterns in the language in question which we can see and capture.  C. M. SperbergMcQueen Black Mesa Technologies LLC http://blackmesatech.com
Current Thread 


< Previous  Index  Next > 

Re: [xsl] Need an XPath expression , Liam R. E. Quin liam  Thread  Re: [xsl] Need an XPath expression , Dimitre Novatchev dn 
Re: [xsl] Need an XPath expression, Liam R. E. Quin liam  Date  [xsl] [Solution] Need an XPath expr, Roger L Costello cos 
Month 