Re: [xsl] parsing parens in the park

Subject: Re: [xsl] parsing parens in the park
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Sun, 28 Sep 2008 15:50:40 -0400
At 2008-09-28 15:29 -0400, Syd Bauman wrote:
Thank you very much for your speedy response. (Relativistic, I
daresay, as your response was sent over 3 hours before I posted. :-)

Which is when I noticed I never restarted my Eudora between leaving California and returning to Kars (eastern time zone). This message's timestamp should be correct. It took me a while to figure out what went wrong where (when?). I'm sending this at 15:50EDT.


Yes and no. While you've saved me a bunch of time in hammering out
the details of analyze-string, (non-)matching-substring, and
regex-group() usage, it is the regexp for parsing over matched parens
that seems to be the hard part, at least to me.

It was to me as well, and so I took it on as a challenge, but I couldn't see how to do it without some traditional programming techniques shoehorned into XSLT 2. Perhaps someone can propose a regex-pure solution, but I couldn't think of one.


I am probably going to start messing with expressions like
   \(([^)]|\([^)]*\))*\)
in a bit.

Meanwhile, thank you very much, Ken, for posting this. Such efforts
always have positive side effects. In this case, I had not realized
that in XSLT 2 one can use
    <xsl:value-of select="'[Got',regex-group(1), 'with',regex-group(2),']'"/>
where I would have steadfastly stuck with
    <xsl:text>[Got </xsl:text>
    <xsl:value-of select="regex-group(1)"/>
    <xsl:text> with </xsl:text>
    <xsl:value-of select="regex-group(2)"/>
    <xsl:text>]&#x0A;</xsl:text>

Yes, using a sequence for messages is quite convenient at times. When this comes up in class it seems new to many of my students.


I think I may have solved it below ... please let me know if there are edge cases I didn't think of. I tried to keep it as general as possible, not knowing if you were putting out document structure or flat strings. In the solution below I'm putting out an element for each string with a set of name/value pairs as attributes.

I hope this helps.

. . . . . . . . . . . . . Ken


t:\ftemp>type syd.xml
<?xml version="1.0" encoding="US-ASCII"?>
<tests>
<test>name1(value1)name2(with(nested(parens((inside)of) value))name3(abc)</test>
<test>name1(value1)name2( with(nested(parens((inside)of) value)more)name3()</test>
<test>name1 (value1) name2 (value2) name3 (value3)</test>
<test>name1 (value1)name2 () name3(value3)</test>
</tests>


t:\ftemp>call xslt2 syd.xml syd.xsl syd.out

t:\ftemp>type syd.out
<?xml version="1.0" encoding="UTF-8"?>
<results>
<result name1="value1" name2="with((nested(parens(inside)of) value))" name3="abc"/>
<result name1="value1" name2=" with((nested(parens(inside)of) value)more)" name3=""/>
<result name1="value1" name2="value2" name3="value3"/>
<result name1="value1" name2="" name3="value3"/>
</results>
t:\ftemp>type syd.xsl
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:c="uri:x-Crane"
exclude-result-prefixes="c"
version="2.0">


<xsl:output indent="yes"/>

<xsl:template match="tests">
  <results>
    <xsl:apply-templates select="test"/>
  </results>
</xsl:template>

<xsl:template match="test">
  <!--
  Debug: <xsl:value-of select="."/>
  <xsl:text>
</xsl:text>
  -->
  <result>
    <!--use a called template because of the need to generate structure-->
    <xsl:call-template name="c:values"/>
  </result>
</xsl:template>

<!--process a name/value specification-->
<xsl:template name="c:name-value-pair">
  <xsl:param name="name"/>
  <xsl:param name="value"/>

  <xsl:attribute name="{$name}" select="$value"/>
</xsl:template>

<!--find the first value specification in a string of value specifications-->
<xsl:template name="c:values">
  <xsl:param name="rest" select="."/>
  <xsl:if test="$rest">
    <!--walk through the string-->
    <xsl:analyze-string select="$rest" regex="\s*(\i\c*)\s*\((.*?)([()])(.*)">
      <xsl:matching-substring>
        <xsl:choose>
          <xsl:when test="regex-group(3)='('">
            <!--at the start of a nested parenthesized value, so find it-->
            <xsl:variable name="this-rest"
                          select="c:nested-value('',regex-group(4),1)"/>
            <!--process the name/value pair-->
            <xsl:call-template name="c:name-value-pair">
              <xsl:with-param name="name" select="regex-group(1)"/>
              <xsl:with-param name="value" select="concat(regex-group(2),'(',
                                                          $this-rest[1],')')"/>
            </xsl:call-template>
            <!--find the next value specification-->
            <xsl:call-template name="c:values">
              <xsl:with-param name="rest" select="$this-rest[2]"/>
            </xsl:call-template>
          </xsl:when>
          <xsl:otherwise>
            <!--process the name/value pair-->
            <xsl:call-template name="c:name-value-pair">
              <xsl:with-param name="name" select="regex-group(1)"/>
              <xsl:with-param name="value" select="regex-group(2)"/>
            </xsl:call-template>
            <!--find the next value specification-->
            <xsl:call-template name="c:values">
              <xsl:with-param name="rest" select="regex-group(4)"/>
            </xsl:call-template>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:message>
          <xsl:text>Whoops!  What happened here? Not expected: </xsl:text>
          <xsl:value-of select="."/>
        </xsl:message>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:if>
</xsl:template>

<!--recursively determine a parenthesized value assuming balanced parens-->
<xsl:function name="c:nested-value">
  <!--returning a sequence of two values: the value and the rest-->
  <xsl:param name="this"/>
  <xsl:param name="rest"/>
  <xsl:param name="depth"/>
  <xsl:if test="$rest">
    <!--there is still more to do-->
    <xsl:analyze-string select="$rest" regex="(.*?)([()])(.*)">
      <xsl:matching-substring>
        <xsl:choose>
          <xsl:when test="regex-group(2)='('">
            <!--yet another nested value-->
            <xsl:sequence select="c:nested-value( concat($this,'(',
                                                         regex-group(1)),
                                                  regex-group(3),
                                                  $depth + 1)"/>
          </xsl:when>
          <xsl:when test="$depth=1">
            <!--found the last balanced parenthesis-->
            <xsl:sequence select="concat($this,regex-group(1)),
                                  regex-group(3)"/>
          </xsl:when>
          <xsl:otherwise>
            <!--found a nested balanced parenthesis-->
            <xsl:sequence select="c:nested-value(concat($this,regex-group(1),
                                                        ')'),
                                                 regex-group(3),
                                                 $depth - 1)"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:message>
          <xsl:text>Nested whops!  Not expected:</xsl:text>
          <xsl:value-of select="."/>
        </xsl:message>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:if>
</xsl:function>

</xsl:stylesheet>
t:\ftemp>rem Done!



--
Upcoming XSLT/XSL-FO hands-on courses:      Wellington, NZ 2009-01
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

Current Thread