Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

Subject: Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Tue, 8 Jan 2002 19:44:57 +0000
David,

>> Presumably the match patterns would match the entire string? 
>
> you think I have any idea how it's supposed to work?

I assumed you'd nicked the idea from Omnimark or somewhere where
presumably some clever bods had worked everything out for you. But if
you're just making it up as you go along, even more reason to thrash
out the details before you write it up into a nice proposal to send to
the various comments lists ;)

> Actually no i think not see in my example first the \gamma template
> fired then the \sqrt one even though they're in the same string.

So priority would be a real issue, in case there were overlapping
matches. And the XML Schema regexp syntax would need to be amended so
that it can match the start/end of strings.

>> The big weakness if this was the *only* method of doing matches is
>> that you can't use it with dynamic regular expressions - for
>> example match strings that contain the keyword $keyword as a word.
>> Any ideas around that one?
>
> Currently template matches don't allow global variables in a
> (failed) attempt to prevent circularity, if template matches could
> have global variables references then these regexp ones could too.
> then of course you'd have to decide if the regexp string attribute
> was an xpath expression and so make up the regexp using concat() or
> whether it is a string but allow expath expressions such as variable
> references using avt {}.

Hmm... I think that both would be confusing, given that normal
template match attributes aren't expressions or AVTs. Perhaps the
template-regexp-matching idea could be supplemented with an
regexp-matching instruction, something like:

  <xsl:regexp match="{RegExp}">
    ...
  </xsl:regexp>

with the match attribute being an AVT, which would provide the
variables in the same way as you suggested. Obviously this is less
powerful when you want to do lots of replacements on the same string,
but it would be more handy for the dynamic regexps or for inline code
that you don't want to split off into a separate template (for example
because the regular expression should match the entire string anyway).

>> occurred to me that this could be useful if, for example, you
>> wanted all your numbers to be formatted in the same way throughout
>> the document - you could have:
>
>   <xsl:template match="value of type xs:decimal">
>     <xsl:value-of select="format-number(., '#,##0.00')" />
>   </xsl:template>
>
> but if an element node is schema typed you can presumably go
>
> <xsl:template match="text()[. = cast-to-decimal(.)]">
>   <xsl:value-of select="format-number(., '#,##0.00')" />
> </xsl:template>
>
> once they decide what the constructor and/or casting function syntax
> is.

Yes of course, but you couldn't do that with numbers embedded in
sequences, or values that you compute (price * quantity). Or you could
if you turned them into text nodes and assigned the relevant value
type to the text node through some magic that isn't clear to me (I
don't think that text nodes can have a type).

Say you had a structured string like path data in SVG which you wanted
to normalize. A sample un-normalized SVG path is:

  M200,300 L400,50 L600,300 L800,550 L1000,300

Say you wanted to normalize and adjust that to:

  M 20 30 L 40 5 L 60 30 L 80 55 L 100 30

XML Schema only accepts list types separated by spaces, so the
original coordinate path would be:

  ('M200,300', 'L400,50', 'L600,300', 'L800,550', 'L1000,300')

and I'll assume that each of these items is actually a svg:command
rather than an xs:string.

Given type-matching templates, I could do:

<xsl:template match="value of type svg:command">
  <xsl:variable name="instruction" type="svg:instruction"
                select="substring(., 1, 1)" />
  <xsl:variable name="x" type="svg:coordinate"
                select="substring-before(substring(., 2), ',')" />
  <xsl:variable name="y" type="svg:coordinate"
                select="substring-after(substring(., 2), ',')" />
  <xsl:apply-templates select="($instruction, $x, $y)" />
</xsl:template>

<xsl:template match="value of type svg:instruction">
  <xsl:value-of select="." />
  <xsl:text> </xsl:text>
</xsl:template>

<xsl:template match="value of type svg:coordinate">
  <xsl:value-of select=". div 10" />
  <xsl:text> </xsl:text>
</xsl:template>

just as without type-matching templates, I can do:

  <xsl:for-each select="$command">
    <xsl:variable name="instruction" type="svg:instruction"
                  select="substring(., 1, 1)" />
    <xsl:variable name="x" type="svg:coordinate"
                  select="substring-before(substring(., 2), ',')" />
    <xsl:variable name="y" type="svg:coordinate"
                  select="substring-after(substring(., 2), ',')" />
    <xsl:for-each select="($instruction, $x, $y)">
      <xsl:choose>
        <xsl:when test=". instance of svg:instruction">
          <xsl:value-of select="." />
        </xsl:when>
        <xsl:when test=". instance of svg:coordinate">
          <xsl:value-of select=". div 10" />
        </xsl:when>
      </xsl:choose>
      <xsl:text> </xsl:text>
    </xsl:for-each>
  </xsl:for-each>


In fact, if you think of nodes as just one possible data type, then
you could think of template match patterns as a basic form of data
type test. If patterns and data type tests could be unified...

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread