Subject: RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft) From: "Chris Bayes" <chris@xxxxxxxxxxx> Date: Mon, 14 Jan 2002 15:02:48 -0000 |
> > Chris, > > > I've been a bit tied up with one thing and another (and I think you > > might have discussed this before) but aren't regex matches just > > predicates on text nodes ala <xsl:template match="text()['\(.*\)']"> > > <x><xsl:apply-templates select=".[1]" /></x> > > </xsl:template> > > Which applies templates to whatever is not matched (child > texts) (but > > which matches the template). > > Not all strings that you might deal with are text nodes, so I > think that you need to provide something that allows you to > match other strings as well. Indeed, your example above > demonstrates this - when you do .[1], then presumably you're > applying templates to the matched substring of the current > text node. I think that there are three possibilities: > > - assume that when you apply templates to a string, it's > automatically converted to a text node, and apply templates to > that > - open up normal templates so that they can match things other than > nodes What is wrong with that? A template that matches text is pretty much the end of the line anyway. > - introduce specific regexp templates > > > So that template on a text node > > "(a(b(c)d)e)" (assuming greedy)would produce > > <x> > > a > > <x> > > b > > <x> > > c > > </x> > > d > > </x> > > e > > </x> > > Unfortunately, assuming greedy, (a)(b) would produce: > > <x>a)(b</x> > Yeh but it doesn't have to be greedy. <xsl:template match="\((.*?)\)(.*)"> <x><xsl:apply-templates select=".[1]" /></x> <xsl:apply-templates select=".[2]" /> </xsl:template> Or <xsl:template match="\((.*?)\)"> <x><xsl:apply-templates select=".[1]" /></x> <xsl:apply-templates select="$'" /> </xsl:template> > which is probably not what you want. This is why I suggested > the bracket-balancing tokenize() function. For example, you'd have: > > <xsl:apply-regexp-templates select="'(a(b(c)(d))e)'" /> > > and then: > > <xsl:regexp-template match="\((.*)\)"> > <x> > <xsl:apply-regexp-templates > select="tokenize(current-match()[1], '\(', '\)')" /> > </x> > </xsl:regexp-template> > > would give: > > <x>a<x>b<x>c</x><x>d</x></x>e</x> > > > Maybe it's rubbish but it doesn't look too alien to me. What other > > useful predicates can you put on a text node? > > Commonly, I'd guess: > > text()[1] > text()[normalize-space()] > text()[starts-with(., 'foo')] > text()[contains(., 'foo')] > > The second one is the one that would clash with what you're > suggesting (where any string used as the predicate to a text > node acts as an implicit regexp test on the value of the text node). Yeh but they are integers or booleans except 2 which would be false for <x>a b</x> hmmmm > > But you could always have a test() function that does the > test explicitly instead: > > text()[test('\(.*\)')] > > Or the other option is to have a special syntax to refer to a > regular expression, You mean like text()['regexp'] Which can't be confused with text()[normalize-space()] > or even to make regular expressions first > class objects. > > > Surely it isn't going to clash with anything. There are nearly 1000 > > pages of wd's to look at here so looking at it another way is there > > anything that says that . can't be a sequence and that I > can't index > > into it with .[x]? > > . is defined as being the context item (or a singleton > sequence containing the context item, Which it would be for a node but for a regex it wouldn't be. > depending on how you > want to view it), so logically .[2] should never return > anything. Currently, as in XPath 1.0, . is an abbreviated > step and cannot take any StepQualifiers (which includes predicates). > > The way I (and I think David) was thinking, you'd use > current-match() or some other function to get information > about the subexpression matches when you were inside the > template. So perhaps: > > current-match()[x] > > rather than .[x]. Well if you like typing ;-) Ciao Chris XML/XSL Portal http://www.bayes.co.uk/xml XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Regular expression functions (W, Jeni Tennison | Thread | Re: Regular expression functions (W, Jeni Tennison |
RE: Regular expression functions (W, Marc Portier | Date | Re: [xsl] Function arguments (was r, Jeni Tennison |
Month |