[xsl] xsl:analyze-string problem

Subject: [xsl] xsl:analyze-string problem
From: Yves Forkl <Y.Forkl@xxxxxx>
Date: Thu, 08 Feb 2007 17:47:58 +0100
Hi XSLT 2.0 wizards,

while the syntax and semantics of xsl:analyze-string have become clear to me, I am now in search of an idiom implying it which it could help me solve this problem. (Or maybe of an alternative...)

In the input I find elements like these:

1) <e> def ghi</e>
2) <e> abc 22 def 3 ghi 1. </e>
3) <e> 2. </e>
4) <e> 3. def 35 78 ghi </e>

The possible contents fit into exactly 4 classes:

1) just some words and/or numbers
2) like 1), but followed by a number and a period
3) just a number and a period
4) like 3), but followed by some words and/or numbers

In each case, spaces may or may not appear at beginning and end of the content and must be preserved (no matter to which group they get attached).

The problem consists of replacing the original "e" element by creating new elements according to these rules:

A) A number followed by a period goes into a "ordinal" element.
B) Words and numbers go into a "text" element.
C) In cases 1) and 4), where words and numbers appear at the end, the content of the current "e" element must be concatenated with all adjacent "e" elements of type 1) and 2) before putting it all into the "text" element. By contrast, in cases 2) and 3) which are ended by a number and a period the contents of the following "e" instance should never be appended.


My approach is to use the following templates:

<xsl:template match="e">

<xsl:analyze-string select="." regex="^(.*?)( *[0-9]\. *)(.*)$">

      <xsl:for-each select="regex-group(1)">
        <xsl:call-template name="create_element_and_space">
          <xsl:with-param name="new_element_name" select="'text'"/>
        </xsl:call-template>
      </xsl:for-each>

      <xsl:for-each select="regex-group(2)">
        <xsl:call-template name="create_element_and_space">
          <xsl:with-param name="new_element_name" select="'ordinal'"/>
        </xsl:call-template>
      </xsl:for-each>

      <xsl:for-each select="regex-group(3)">
        <xsl:call-template name="create_element_and_space">
          <xsl:with-param name="new_element_name" select="'text'"/>
        </xsl:call-template>
      </xsl:for-each>

</xsl:matching-substring>

</xsl:analyze-string>

<xsl:apply-templates select="following-sibling::e[1]"/>

</xsl:template>


<!-- helper template for squeezing spaces out into mixed content --> <xsl:template name="create_element_and_space"> <xsl:param name="new_element_name"/>

<xsl:analyze-string select="." regex="^\s+|\s+$">

    <xsl:matching-substring>
      <xsl:value-of select="."/>
    </xsl:matching-substring>

    <xsl:non-matching-substring>
      <xsl:element name="{$new_element_name}">
        <xsl:value-of select="."/>
      </xsl:element>
    </xsl:non-matching-substring>

</xsl:analyze-string>

</xsl:template>


What is not clear to me is:


- whether the regex actually suffices to match the rules

- if it is a good idea to use xsl:for-each there

- how to assure concatenation of all the "e" instances' contents in cases 1) and 4) without processing them repeatedly - i.e.: how can I restrict the call to xsl:apply-templates to cases 2) and 3)?

Any comments would be greatly appreciated.

Yves

Current Thread