RE: [xsl] xsl:analyze-string problem

Subject: RE: [xsl] xsl:analyze-string problem
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 8 Feb 2007 17:00:55 -0000
I would tackle this as follows:

Step 1: classify the element. Use xsl:choose and matches() to decide which
of the four categories it belongs to, and copy the element adding an
attribute to indicate the category.

Step 2: do the grouping (concatenation of adjacent elements according to
your rule C). Probably using xsl:for-each-group group-adjacent, but I'm not
entirely clear of the criteria.

Step 3: use analyze-string on the contents of the grouped elements to insert
<ordinal> and <text> element children.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Yves Forkl [mailto:Y.Forkl@xxxxxx] 
> Sent: 08 February 2007 16:48
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] xsl:analyze-string problem
> 
> Hi XSLT 2.0 wizards,
> 
> while the syntax and semantics of xsl:analyze-string have 
> become clear to me, I am now in search of an idiom implying 
> it which it could help me solve this problem. (Or maybe of an 
> alternative...)
> 
> In the input I find elements like these:
> 
> 1) <e> def ghi</e>
> 2) <e> abc 22 def 3 ghi 1. </e>
> 3) <e> 2. </e>
> 4) <e> 3. def 35 78 ghi </e>
> 
> The possible contents fit into exactly 4 classes:
> 
> 1) just some words and/or numbers
> 2) like 1), but followed by a number and a period
> 3) just a number and a period
> 4) like 3), but followed by some words and/or numbers
> 
> In each case, spaces may or may not appear at beginning and 
> end of the content and must be preserved (no matter to which 
> group they get attached).
> 
> The problem consists of replacing the original "e" element by 
> creating new elements according to these rules:
> 
> A) A number followed by a period goes into a "ordinal" element.
> B) Words and numbers go into a "text" element.
> C) In cases 1) and 4), where words and numbers appear at the 
> end, the content of the current "e" element must be 
> concatenated with all adjacent "e" elements of type 1) and 2) 
> before putting it all into the "text" element. By contrast, 
> in cases 2) and 3) which are ended by a number and a period 
> the contents of the following "e" instance should never be appended.
> 
> My approach is to use the following templates:
> 
> <xsl:template match="e">
> 
>    <xsl:analyze-string select="." regex="^(.*?)( *[0-9]\. *)(.*)$">
> 
>        <xsl:for-each select="regex-group(1)">
>          <xsl:call-template name="create_element_and_space">
>            <xsl:with-param name="new_element_name" select="'text'"/>
>          </xsl:call-template>
>        </xsl:for-each>
> 
>        <xsl:for-each select="regex-group(2)">
>          <xsl:call-template name="create_element_and_space">
>            <xsl:with-param name="new_element_name" 
> select="'ordinal'"/>
>          </xsl:call-template>
>        </xsl:for-each>
> 
>        <xsl:for-each select="regex-group(3)">
>          <xsl:call-template name="create_element_and_space">
>            <xsl:with-param name="new_element_name" select="'text'"/>
>          </xsl:call-template>
>        </xsl:for-each>
> 
>      </xsl:matching-substring>
> 
>    </xsl:analyze-string>
> 
>    <xsl:apply-templates select="following-sibling::e[1]"/>
> 
> </xsl:template>
> 
> 
> <!-- helper template for squeezing spaces out into mixed 
> content --> <xsl:template name="create_element_and_space">
>    <xsl:param name="new_element_name"/>
> 
>    <xsl:analyze-string select="." regex="^\s+|\s+$">
> 
>      <xsl:matching-substring>
>        <xsl:value-of select="."/>
>      </xsl:matching-substring>
> 
>      <xsl:non-matching-substring>
>        <xsl:element name="{$new_element_name}">
>          <xsl:value-of select="."/>
>        </xsl:element>
>      </xsl:non-matching-substring>
> 
>    </xsl:analyze-string>
> 
> </xsl:template>
> 
> 
> What is not clear to me is:
> 
> - whether the regex actually suffices to match the rules
> 
> - if it is a good idea to use xsl:for-each there
> 
> - how to assure concatenation of all the "e" instances' 
> contents in cases 1) and 4) without processing them 
> repeatedly - i.e.: how can I restrict the call to 
> xsl:apply-templates to cases 2) and 3)?
> 
> Any comments would be greatly appreciated.
> 
>    Yves

Current Thread