RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

Subject: RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Fri, 11 Jan 2002 14:11:31 -0000
> You mean, assuming that current-match() returned the node tree
> described in the mail, if I did:
>   current-match()/mantissa == current-match()/mantissa
> would the result be true or false? Or if I did:
>   match($string1, $regexp1) == match($string2, $regexp2)
> would the result be true or false?

Yes, that is the question.
> I think that in both cases returning different trees would be more
> consistent, since user-defined functions won't have the luxury of
> being able to reuse trees.

The argument for returning the same tree is the same as with the document()
function. It means it can be safely optimized by pulling it out of a loop or
by eliminating common sub-expressions.
> > I guess one could say that it's explicitly implementation-defined,
> > and no-one would worry too much about it. But it's also something
> > you want to avoid if at all possible because constructing new trees
> > is always expensive.
> Is that because constructing *nodes* is expensive or is it the *links*
> between the nodes within a tree that makes things problematic?

Both. Creating objects with identity is expensive in most languages, it
involves memory-allocation overheads. The need to support all the axes makes
the objects quite heavyweight.

> If the
> latter, then perhaps documentless nodes are a blessing ;)

Not if it means they have to be copied by physical cloning!

> If the
> former, then it's a good argument for nested sequences, so you don't
> have to create nodes to provide structure.

Yes, there are some good arguments for nested sequences. But let's not go
there, we want to get this thing finished.

In the case of the regular expression functionality you are trying to
define, I've been trying to follow the arguments but haven't reached any
particular views on what the right answer is. I don't have much personal
experience of languages that use regexps heavily, which doesn't help. It
might be that a solution similar to xsl:for-each-group is needed. This was
constrained by the fact that we couldn't model a set of groups directly in
the data model, so instead we defined an instruction to iterate over the set
of groups, presenting one group at a time to the application, and making
that group available through the magic function current-group(). I sort of
feel an xsl:for-each-string-match might work similarly, but I can't
articulate the details yet. Keep working at it, guys.

Mike Kay

 XSL-List info and archive:

Current Thread