Re: [xsl] java Regex call

Subject: Re: [xsl] java Regex call
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Thu, 10 Jul 2003 13:51:53 +0100
Dave,

>> So the string is broken into two matching substrings ('ABC_PARA'
>> and '_PARA') with a non-matching substring of '_' in between. And
>> you get two lots of output because you're generating one for each
>> of the matching substrings.
>
> Ah! Getting very close to stateful here Jeni?

In what way?

> I hadn't figured the 'iteration' idea. I'd have expected (without
> thinking too closely) all the matches to have come up in the
> <xsl:matching-substring> section, then all the non matching in the
> following <xsl:non-matching-substring> section.

I can't think of many situations in which that would be what you want.

A typical example of when you might want to use <xsl:analyze-string>
is to replace all the newline characters in a string with <br>
elements. You can do this with:

  <xsl:analyze-string select="$string" regex="\n">
    <xsl:matching-substring>
      <br />
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="." />
    </xsl:non-matching-substring>
  </xsl:analyze-string>

If you had "all matching substrings, then all non-matching substrings"
then you'd get all the <br> elements and then all the text, which
wouldn't be much good.

> http://www.w3.org/TR/xslt20/#element-analyze-string
> only hints at this 'ordering',

The spec says:

  The input string is thus partitioned into a sequence of substrings,
  some of which match the regular expression, others which do not
  match it. Each substring will contain at least one character. This
  sequence of substrings is processed using the xsl:matching-substring
  and xsl:non-matching-substring child instructions. A matching
  substring is processed using the xsl:matching-substring element, a
  non-matching substring using the xsl:non-matching-substring element.

As elsewhere in XPath 2.0, the term "sequence" means an ordered list
of items.
  
> While the xsl:matching-substring instruction is active, ... the
> regex-group parameter is sequential whilst active I guess.

I don't understand what you mean by "sequential".

> I might reasonably expect that regexp-group(4) would get hold of the
> fourth match string?

That's not what it does. regex-group() is used to get the value of the
substring matched by a subexpression in the regular expression. For
example, if you had a regular expression:

  (\d{4})-(\d{2})-(\d{2})

and matched the string "2003-07-10" then regex-group(1) would give you
"2003", regex-group(2) would give you "07" and regex-group(3) would
give you "10".

There's no easy way to get hold of the "fourth matching substring".
If you use the position() function within <xsl:matching-substring>  or
<xsl:non-matching-substring> then you'll get the position of the
matching/non-matching substring amongst all the other (matching and
non-matching) substrings.

> Using:
>
>  <xsl:variable name="res" as="item()*">
>    <xsl:analyze-string select="$input" regex="{$regex}">
>       <xsl:matching-substring>
>         <GROUP1><xsl:value-of select="regex-group(1)" /></GROUP1>
>         <GROUP2><xsl:value-of select="regex-group(2)" /></GROUP2>
>         <GROUP3><xsl:value-of select="regex-group(3)" /></GROUP3>
>       </xsl:matching-substring>
>       <xsl:non-matching-substring>
>         <mismatch><xsl:value-of select="."/></mismatch>
>       </xsl:non-matching-substring>
>     </xsl:analyze-string>
>   </xsl:variable>
>
>   <xsl:copy-of select="$res"/>
>
>  gives:
>
> <GROUP1>ABC_PARA</GROUP1>
> <GROUP2>ABC</GROUP2>
> <GROUP3/>
> <mismatch>_</mismatch>
> <GROUP1>_PARA</GROUP1>
> <GROUP2/>
> <GROUP3/>
>
> Which is close to usable, though very messy.

I'm not sure what you're trying to get, so can't advise how to get it
cleanly.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread