[xsl] xsl:analyze-string and multiple matching groups

Subject: [xsl] xsl:analyze-string and multiple matching groups
From: Florent Georges <lists@xxxxxxxxxxxx>
Date: Wed, 1 Feb 2012 14:42:39 +0000 (GMT)
  Hi,

  Let's say I have a string of the form "a:b;c:d;" where there
can be
any number of sub-parts of the form "x:y;", that I'd like
to parse using
xsl:analyze-string.  With the following regex:

    ^(([a-z]):([a-z]);)+$
which matches indeed, I cannot use the regex-groups to retrieve
all values.
 For instance the following:

    <xsl:analyze-string select="'a:b;c:d;'"
   
                    regex="^(([a-z]):([a-z]);)+$">
     
 <xsl:matching-substring>
          <group num="0" value="{ regex-group(0)
}"/>
          <group num="1" value="{ regex-group(1) }"/>
          <group
num="2" value="{ regex-group(2) }"/>
          <group num="3" value="{
regex-group(3) }"/>
          <group num="4" value="{ regex-group(4) }"/>
   
      <group num="5" value="{ regex-group(5) }"/>
          <group num="6"
value="{ regex-group(6) }"/>
          <group num="7" value="{ regex-group(7)
}"/>
       </xsl:matching-substring>
    </xsl:analyze-string>

returns the
following:

    <group num="0" value="a:b;c:d;"/>
    <group num="1"
value="c:d;"/>
    <group num="2" value="c"/>
    <group num="3" value="d"/>
 
  <group num="4" value=""/>
    <group num="5" value=""/>
    <group num="6"
value=""/>
    <group num="7" value=""/>

when I would have expected the
following instead:

    <group num="0" value="a:b;c:d;"/>
    <group num="1"
value="a:b;"/>
    <group num="2" value="a"/>
    <group num="3" value="b"/>
 
  <group num="4" value="c:d;"/>
    <group num="5" value="c"/>
    <group
num="6" value="d"/>
    <group num="7" value=""/>

  That is, I expected the
regex-groups to match the "dynamic"
number of groups, instead of the strict
"static" or "lexical"
group numbering from the regex string.  I thought that
was what
I was used to in Perl and other tools, by I can't recall for
sure,
and I didn't find a definitive answer in the spec.

  Are my expectations
wrong?  If yes why?  And if yes, is there
any general solution to this
problem? (by "general", I mean not
recursing on the string and using substring
on ';' because here
this is a simple delimiter)

  BTW, tested with Saxon HE
9.3.0.5 and 9.4.0.2.

  Regards,

-- 
Florent Georges
http://fgeorges.org/
http://h2oconsulting.be/

Current Thread