Re: [xsl] Flattening parts of a document heirarchy

Subject: Re: [xsl] Flattening parts of a document heirarchy
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Fri, 31 Oct 2003 12:03:59 +0000
Hi Dave,

>> incoming document is something like this:
>> <doc>
>>     text text text
>>     <sec id="sec1">
>>        <p>text1text1text1</p>
>>         <sec id="sec1.1">
>>            <p>text2 text2 text2</p>
>>         </sec>
>>        <p>text3 text3 text3</p>
>>    </sec>
>> </doc>
>> i.e <sec> tags are nested to arbitary level.
>> I need
>> <doc>
>>    text text text
>>    <section id="s1" level="1">
>>        <p>text1 text1 text1</p>
>>    </section>
>>    <section id="s2" level="2">
>>         <p>text text text</p>
>>    </section>
>>    <section id="s3" level="1">
>>        <p>text3 text3 text3</p>
>>    </section>
>> </doc>
> No replies to this, can XSLT really not do this? I've decided to
> address this by pre-filtering using SAX where this kind of transform
> is pretty easy.

If you're happy using SAX to do it, I think you should do so. This
kind of transformation is really suited to a streaming approach, in
which you go through the elements as they appear and insert start and
end tags as appropriate.

You *can* do it in XSLT, by simulating that streaming approach and
stepping through nodes one by one. When you come across a <sec>
element within the <doc> element, only apply templates to its first
child, in flatten mode:

<xsl:template match="sec">
  <xsl:apply-templates select="node()[1]" mode="flatten" />

In flatten mode, most nodes should create a <section> element, the
content of which will be the result of applying templates in copy mode
to the node itself. After the <section> element comes the result of
applying templates in flatten mode to the next <sec> element (elements
between this one and the <sec> element will be copied into the section
via the copy mode templates):

<xsl:template match="node()" mode="flatten">
  <section level="{count(ancestor::sec)}">
    <xsl:apply-templates select="." mode="copy" />
  <xsl:apply-templates select="following-sibling::sec[1]"
                       mode="flatten" />

Processing of <sec> elements in flatten mode is similar to the
processing of <sec> elements in the normal mode: you apply templates
to the first child of the <sec> element in flatten mode, but then you
go on to process the next following sibling of the <sec> element:

<xsl:template match="sec" mode="flatten">
  <xsl:apply-templates select="node()[1]" mode="flatten" />
  <xsl:apply-templates select="following-sibling::node()[1]"
                       mode="flatten" />

Processing of most nodes in copy mode is to copy the node itself and
then move on to the next sibling node:

<xsl:template match="node()" mode="copy">
  <xsl:copy-of select="." />
  <xsl:apply-templates select="following-sibling::node()[1]"
                       mode="copy" />

Processing of <sec> elements in copy mode is to do nothing:

<xsl:template match="sec" mode="copy" />

Putting IDs that increment sequentially on the <section> elements
would require a different approach that gives me a headache; I'd do it
by post-processing the result of the above transformation to add the

Things are a lot easier in XSLT 2.0, in which you can use the
group-adjacent attribute to create groups based on the id of the
parent <sec> element.

<xsl:template match="sec">
  <xsl:for-each-group select="descendant::node()
                                [parent::sec and not(self::sec)]"
    <section id="s{position()}" level="{count(ancestor::sec)}">
      <xsl:copy-of select="current-group()" />

Note that this approach allows you to create the incrementing IDs very
easily as well.



Jeni Tennison

 XSL-List info and archive:

Current Thread