Re: [xsl] two regexp related questions

Subject: Re: [xsl] two regexp related questions
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Thu, 19 May 2011 22:24:16 +0200
On 2011-05-19 21:16, Julian Reschke wrote:
On 2011-05-19 20:51, Brandon Ibach wrote:
For 2), if you're using the regex to both validate the input (making
sure it conforms to the required syntax) and parse/extract the
name/value pairs, you might be able to make the job easier by breaking
these two tasks apart. Use the regex as you have it now to validate
the input and then, if it matches, use a shorter regex that matches
just a single name/value pair with analyze-string to do the actual
processing.

-Brandon :)

That's more or less what I do know. But as long as the regex contains a repeating pattern, <xsl:matching-substring> will only be invoked once, and the regex-group function will only return the contents for the last match, right?

I think it depends on the implementation. I couldn't see anything in the spec about what regex-group(3) of
([a-z]+)=([a-z]+)(;([a-z]+)=([a-z]+))*
should be. In Saxon, it's ';e=f' for your example, but in principle it could also be ';c=d'.


As Brandon pointed out, using analyze-string with a repeating pattern that matches the entire string is not the best approach. There are more natural approaches that work without recursion. I sketched two of them below.

Input:
<foo>a=b;c=d;e=f</foo>

XSL:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="2.0">


<xsl:output method="xml" indent="yes" />

  <xsl:template match="/">
    <variants>
      <var type="tokenize replace">
        <xsl:apply-templates mode="tok" />
      </var>
      <var type="analyze-string">
        <xsl:apply-templates mode="as" />
      </var>
      <var type="analyze-string full regex">
        <xsl:apply-templates mode="as-full" />
      </var>
    </variants>
  </xsl:template>

<xsl:template match="foo" mode="tok">
<xsl:copy>
<xsl:for-each select="tokenize(., ';')">
<item name="{replace(., '=.+', '')}" value="{replace(., '.+=', '')}" />
</xsl:for-each>
</xsl:copy>
</xsl:template>


  <xsl:template match="foo" mode="as">
    <xsl:copy>
      <xsl:analyze-string select="." regex="([a-z]+)=([a-z]+);?" flags="i">
        <xsl:matching-substring>
          <item name="{regex-group(1)}" value="{regex-group(2)}" />
        </xsl:matching-substring>
      </xsl:analyze-string>
    </xsl:copy>
  </xsl:template>

<xsl:template match="foo" mode="as-full">
<xsl:copy>
<xsl:analyze-string select="." regex="([a-z]+)=([a-z]+)(;([a-z]+)=([a-z]+))*" flags="i">
<xsl:matching-substring>
<item name="{regex-group(1)}" value="{regex-group(2)}" rest="{regex-group(3)}"/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:copy>
</xsl:template>


</xsl:stylesheet>

Output:
<variants>
   <var type="tokenize replace">
      <foo>
         <item name="a" value="b"/>
         <item name="c" value="d"/>
         <item name="e" value="f"/>
      </foo>
   </var>
   <var type="analyze-string">
      <foo>
         <item name="a" value="b"/>
         <item name="c" value="d"/>
         <item name="e" value="f"/>
      </foo>
   </var>
   <var type="analyze-string full regex">
      <foo>
         <item name="a" value="b" rest=";e=f"/>
      </foo>
   </var>
</variants>

-Gerrit

Best regards, Julian



-- Gerrit Imsieke Geschdftsf|hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschdftsf|hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vvckler

Current Thread