Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

Subject: Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Sun, 13 Jan 2002 00:33:56 +0000
Hi David,

>> Honestly, I can't see much difference between having to handle this
>> syntax and having to handle:
>
> ah that's easy I just use d-o-e for that:-)

Quite. But (I think) you would recommend to people who presented you
with that kind of format, that changing the source XML was a better
way than trying to design a stylesheet to do it. Did I get it right,
by the way? ;)

> I'm not convinced yet actually, I think it duplicates too much
> existing xslt functionality. It's clear that lex/yacc accepts a
> larger class of grammars than regular expressions (I see Dimitre's
> already supplied the details), but I think (somehow) the extra
> functionality (which basically just always comes down to nesting,
> counting and storing information for look-ahead) is already present
> in xslt so the trick is to add regexp support (only) in a way that
> the extra arithmetic functionality can be pulled from xpath/xslt for
> those occasions when you need it. Since we argued a long time ago
> that xslt was turing complete (given an approximation to an infinite
> tape) everything's possible anyway so it's only a matter of
> convenience.

Well of course it's all just a matter of convenience :) This seemed a
very convenient way, to me, of parsing what you'd described. I'd
rather describe the grammar than describe the parsing process, if you
see what I mean. Perhaps it's all the BNF in the WDs affecting me...

The really light-weight method is a simple match() function that
returns start/length pairs of integers. If you have that, then
assuming that templates were allowed to match simple typed values, you
can parse your string with something like:

<xsl:template match="value of type xs:string" mode="row">
  <xsl:variable name="match" select="match(., '^\\([a-z]+)\{')" />
  <xsl:choose>
    <xsl:when test="$match[1] = 1">
      <xsl:variable name="name"
                    select="substring(., $match[3], $match[4])" />
      <start name="{$name}" />
      <xsl:apply-templates select="substring(., $match[2])"
                           mode="row" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-templates select="." mode="expr" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

... and so on in a way that it's really too late at night to detail.
You can manually keep track of brackets with parameters and so on.

I imagine that this is similar to what you're doing at the moment,
just that the regexp rather than substring() et al. might make some of
your life easier.

Actually, I think that the \frac{...}{...} construct is fairly
difficult to handle, but like you say, anything can be done.

I'll think some more...

> given a lex implementation you could (I think, haven't checked)
> implement an Xpath parser if you wished...

Already got an XPath parser. Parsing is the easy part of evaluate() -
the evaluation is the hard part (to implement in XSLT).

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread