[xsl] Combining consecutive siblings

Subject: [xsl] Combining consecutive siblings
From: "Trevor Nicholls" <trevor@xxxxxxxxxxxxxxxxxx>
Date: Tue, 13 Jun 2006 18:41:55 +1200
Hello

In some contexts the XML I am processing contains a run of consecutive
elements which I want to merge into a single element in the output XML. I
have adapted a technique which I found in the archives (it was called
"grouping consecutive elements" then, and didn't do precisely what I wanted,
but seemed to be the right approach).

Here's a cut down piece of XML which highlights the problem:
--------
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Body>If you add this to the enquiry:</Body>
<TABLE><ROW><CELL>
<SB>where </SB>
<SB>  <File>client</File> has <Field>country</Field>="NZ"</SB>
<SB>  and <Field>city</Field>="WELLINGTON"</SB>
<SB>  and <Field>name</Field> matches (plum* ~plumlee*)</SB>
<SB>  and val(<Field>street</Field>[1,pos(" ",<Field>street</Field>)-1])
&gt; 0</SB>
<SB>list</SB>
<SB>  <File>client</File>:<Field>name</Field> <Field>address</Field></SB>
</CELL></ROW></TABLE>
</root>
--------

The requirement is to collapse all the consecutive <SB> elements into a
single <syntax> element, with newlines in the output reflecting the multiple
<SB> nodes, and retaining any lower level structure.

This is the XSL I have arrived at:
--------
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:xs="http://www.w3.org/2001/XMLSchema";>
<xsl:output method="xml" encoding="UTF-8"/>

<xsl:template match="root">
<document><xsl:apply-templates /></document>
</xsl:template>

<xsl:template match="Body">
<xsl:call-template name="nl" />
<para><xsl:apply-templates /></para>
</xsl:template>

<xsl:template match="TABLE">
<xsl:call-template name="nl" />
<xsl:element name="table">
<tbody><xsl:apply-templates select="ROW[CELL]" /></tbody>
</xsl:element>
</xsl:template>

<xsl:template match="ROW">
<tr><xsl:apply-templates /></tr>
</xsl:template>

<xsl:template match="CELL">
<td><xsl:apply-templates /></td>
</xsl:template>

<xsl:template match="SB">
<xsl:call-template name="nl" />
<syntax>
<xsl:apply-templates select="child::node()" mode="syn"/>
<xsl:apply-templates select="following-sibling::*[1][self::SB]" mode="more"
/>
</syntax>
<xsl:apply-templates select="following-sibling::*[not(self::SB)][1]" />
</xsl:template>

<xsl:template match="SB" mode="more">
<xsl:call-template name="nl" />
<xsl:apply-templates select="child::node()" mode="syn"/>
<xsl:apply-templates select="following-sibling::*[1][self::SB]" mode="more"
/>
</xsl:template>

<xsl:template match="File | Field" mode="syn">
<f><xsl:apply-templates /></f>
</xsl:template>

<xsl:template name="nl"><xsl:text>&#xa;</xsl:text></xsl:template>

</xsl:stylesheet>
--------

The original templates were posted (by Dr Kay, to whom thanks) as an example
of processing elements using "horizontal" recursion: I'll include them here
because someone may instantly see where my adaptation is in error:

--------
<xsl:template match="b">
<b>
  <xsl:copy-of select="."/>
  <xsl:apply-templates 
    select="following-sibling::*[1][self::b]"
    mode="more"/>
</b>
  <xsl:apply-templates 
    select="following-sibling::[not(self::b)][1]"/>
</xsl:template>

<xsl:template match="b" mode="more">
  <xsl:copy-of select="."/>
  <xsl:apply-templates 
    select="following-sibling::*[1][self::b]"
    mode="more"/>
</xsl:template>
--------

Apart from the obvious edits due to my input repeat being <SB> and my output
"container" being <syntax>, I had to make two other changes: this original
template copied the repeated elements (copy-of select=".") which gave me
output XML of <syntax><SB>..</SB><SB>..</SB><SB>..</SB></syntax> and not
<syntax>...</syntax>, secondly when I replaced the copy-of with
<apply-templates select=child::node() /> to retain the low-level structure I
found that the mode="more" was being carried through, hence the mode="syn"
in the stylesheet above.

The resultant output is wrong because instead of giving me one syntax
element containing the content of all the consecutive SB elements, I have as
many syntax elements as SBs, gradually reducing in length, viz:

--------
<?xml version="1.0" encoding="UTF-8"?><document
xmlns:xs="http://www.w3.org/2001/XMLSchema";>
<para>If you add this to the enquiry:</para>
<table><tbody><tr><td>
<syntax>where 
<f>client</f> has <f>country</f>="NZ"
  and <f>city</f>="WELLINGTON"
  and <f>name</f> matches (plum* ~plumlee*)
  and val(<f>street</f>[1,pos(" ",<f>street</f>)-1]) &gt; 0
list
<f>client</f>:<f>name</f><f>address</f></syntax>
<syntax><f>client</f> has <f>country</f>="NZ"
  and <f>city</f>="WELLINGTON"
  and <f>name</f> matches (plum* ~plumlee*)
  and val(<f>street</f>[1,pos(" ",<f>street</f>)-1]) &gt; 0
list
<f>client</f>:<f>name</f><f>address</f></syntax>
<syntax>  and <f>city</f>="WELLINGTON"
  and <f>name</f> matches (plum* ~plumlee*)
  and val(<f>street</f>[1,pos(" ",<f>street</f>)-1]) &gt; 0
list
<f>client</f>:<f>name</f><f>address</f></syntax>
<syntax>  and <f>name</f> matches (plum* ~plumlee*)
  and val(<f>street</f>[1,pos(" ",<f>street</f>)-1]) &gt; 0
list
<f>client</f>:<f>name</f><f>address</f></syntax>
<syntax>  and val(<f>street</f>[1,pos(" ",<f>street</f>)-1]) &gt; 0
list
<f>client</f>:<f>name</f><f>address</f></syntax>
<syntax>list
<f>client</f>:<f>name</f><f>address</f></syntax>
<syntax><f>client</f>:<f>name</f><f>address</f></syntax>
</td></tr></tbody></table></document>
--------

What am I doing wrong? Can anyone help?

Cheers
Trevor

Current Thread