[xsl] Grouping - starting with certain element and ending before certain elements

Subject: [xsl] Grouping - starting with certain element and ending before certain elements
From: mathias.thiel@xxxxxxxxxxxxxxxxxxxxxx
Date: Wed, 5 Oct 2005 13:59:05 +0200
Dear all,

I have a grouping problem. I want to create nested sections of a
sequence of elements that starts with a certain element and ends
before certain elements. Apart from this, all elements should appear
in the original order. This is my input file:

<?xml version="1.0" encoding="iso-8859-1"?>
<dict>
    <entry id="000152">
      <HWD>ace</HWD>
      <LV2>I</LV2>
      <PSA>n</PSA>
      <LV3>1</LV3>
      <TSL>ess</TSL>
      <IDM>~ of hearts</IDM>
      <TSL>hjdrteress</TSL>
      <IDM>have an ~ up one's sleeve</IDM>
      <SEA>bildl.</SEA>
      <TSL>ha trumf pe hand</TSL>
      <LV3>2</LV3>
      <IDM>within an ~ of</IDM>
      <TSL>ytterst ndra</TSL>
      <EXA>within an ~ of victory</EXA>
      <LV3>3</LV3>
      <TSL>ess</TSL>
      <LV3>4</LV3>
      <SEA>i tennis</SEA>
      <TSL>serveess</TSL>
      <IDM>She has already hit 13 ~s</IDM>
      <TSL>Hon hade redan tagit 13 serveess</TSL>
      <LV2>II</LV2>
      <PSA>attr adj</PSA>
      <IDM>~ reporter</IDM>
      <TSL>stjdrnreporter</TSL>
   </entry>
</dict>

This is the desired output:

<dict>
    <entry id="000152">
      <HWD>ace</HWD>
      <LV2>I</LV2>
      <PSA>n</PSA>
      <LV3>1</LV3>
      <TSL>ess</TSL>
      <phrase>
            <IDM>~ of hearts</IDM>
            <TSL>hjdrteress</TSL>
      </phrase>
      <phrase>
            <IDM>have an ~ up one's sleeve</IDM>
            <SEA>bildl.</SEA>
            <TSL>ha trumf pe hand</TSL>
      </phrase>
      <LV3>2</LV3>
      <phrase>
            <IDM>within an ~ of</IDM>
            <TSL>ytterst ndra</TSL>
            <EXA>within an ~ of victory</EXA>
      </phrase>
      <LV3>3</LV3>
      <TSL>ess</TSL>
      <LV3>4</LV3>
      <SEA>i tennis</SEA>
      <TSL>serveess</TSL>
      <phrase>
            <IDM>She has already hit 13 ~s</IDM>
            <TSL>Hon hade redan tagit 13 serveess</TSL>
      </phrase>
      <LV2>II</LV2>
      <PSA>attr adj</PSA>
      <phrase>
            <IDM>~ reporter</IDM>
            <TSL>stjdrnreporter</TSL>
      </phrase>
   </entry>
</dict>

So I want to keep the order of all elements but group each <IDM>
element followed by a number of siblings up to another <IDM>, a <LV2>
or a <LV3> element and wrap this sequence in a <phrase> section. This
example is a simplified sample of my data; there are other "stop"
elements as well as other elements that can appear both inside
<phrase> sections and as children of <entry>.

My feeble attempt resulted in the follwing stylesheet:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl
="http://www.w3.org/1999/XSL/Transform";
>
      <xsl:output method="xml" version="1.0" encoding="iso-8859-1" indent
="yes"/>
      <xsl:strip-space elements="*"/>
      <xsl:template match="dict">
            <xsl:copy>
                  <xsl:copy-of select="@*"/>
                  <xsl:apply-templates/>
            </xsl:copy>
      </xsl:template>

      <xsl:template match="entry">
      <xsl:copy>
            <xsl:copy-of select="HWD"/>
            <xsl:choose>
                  <xsl:when test="*[1][not(self::IDM) ]">
                        <xsl:for-each select="HWD">
                              <xsl:apply-templates select
="following-sibling::*[1]" mode="copy"/>
                        </xsl:for-each>
                  </xsl:when>
                  <xsl:otherwise>
                        <xsl:apply-templates/>
                  </xsl:otherwise>
            </xsl:choose>
      </xsl:copy>
      </xsl:template>

      <xsl:template match="*" mode="copy">
                  <xsl:copy-of select="."/>
                  <xsl:apply-templates select="following-sibling::
*[1][not(self::IDM)]" mode="copy"/>
                  <xsl:if test="following-sibling::*[1][self::IDM]">
                        <xsl:apply-templates select="following-sibling::
*[1]"/>
                  </xsl:if>
      </xsl:template>

      <xsl:template match="IDM">
            <phrase>
                  <xsl:copy-of select="."/>
                  <xsl:apply-templates select="following-sibling::
*[1][not(self::IDM)]" mode="copy"/>
            </phrase>
      </xsl:template>
</xsl:stylesheet>

But applied to my input this produces the unwanted output:

<?xml version="1.0" encoding="iso-8859-1"?>
<dict>
   <entry>
      <HWD>ace</HWD>
      <LV2>I</LV2>
      <PSA>n</PSA>
      <LV3>1</LV3>
      <TSL>ess</TSL>
      <phrase>
         <IDM>~ of hearts</IDM>
         <TSL>hjdrteress</TSL>
         <phrase>
            <IDM>have an ~ up one's sleeve</IDM>
            <SEA>bildl.</SEA>
            <TSL>ha trumf pe hand</TSL>
            <LV3>2</LV3>
            <phrase>
               <IDM>within an ~ of</IDM>
               <TSL>ytterst ndra</TSL>
               <EXA>within an ~ of victory</EXA>
               <LV3>3</LV3>
               <TSL>ess</TSL>
               <LV3>4</LV3>
               <SEA>i tennis</SEA>
               <TSL>serveess</TSL>
               <phrase>
                  <IDM>She has already hit 13 ~s</IDM>
                  <TSL>Hon hade redan tagit 13 serveess</TSL>
                  <LV2>II</LV2>
                  <PSA>attr adj</PSA>
                  <phrase>
                     <IDM>~ reporter</IDM>
                     <TSL>stjdrnreporter</TSL>
                  </phrase>
               </phrase>
            </phrase>
         </phrase>
      </phrase>
   </entry>
</dict>

I can start a <phrase> section, but I don't know how to express that
it should end. This seems to be a trivial problem, but not to me. I
had a look at the XSLT 2.0 element <xsl:for-each- group>, and I
thought my problems were solved when I saw there is a group-starting-
with and a group-ending-with attribute; I thought they would make it
possible to specify the first tag of a sequence, in this case 'group-
starting-with="IDM"', and the last one, in my input the one
immediately before certain elements, i.e. 'group-ending-with
attribute="following- sibling::*[1][self::IDM] or following-
sibling::*[1][self::LV2] or following-sibling::*[1][self::LV3]"' in
the same <xsl:for-each- group>, but this is apparantly not the way it
works, as the attributes are mutually exclusive.

For any help I'd be most grateful.

Mathias

Current Thread