RE: [xsl] Taking flat XML and parsing into multi level nexted

Subject: RE: [xsl] Taking flat XML and parsing into multi level nexted
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Wed, 08 Aug 2007 11:51:58 +0100
> I have some horrible pre-generated source XML which is in this form:
> 
> <item>Item Name One</item>
> <categoryStart>Category Name One</categoryStart> <item>Item 
> Name Two</item> <item>Item Name Three</item> 
> <categoryStart>Category Name Two</categoryStart> <item>Item 
> Name Four</item> <categoryEnd>Category Name Two</categoryEnd> 
> <item>Item Name Five</item> <categoryEnd>Category Name 
> One</categoryEnd> <item>Item Name Six</item>

In XSLT 2.0:

<xsl:template name="do-grouping">
  <xsl:param name="input" as="element()*">
  <xsl:for-each-group select="*" group-starting-with="categoryStart">
    <xsl:for-each-group select="current-group()"
group-ending-with="categoryEnd">
    <xsl:choose>
      <xsl:when test="current-group()[1][self:categoryStart]">
        <group>
          <xsl:call-template name="do-grouping">
            <xsl:with-param select="current-group()[self::item]"/>
          </xsl:call-template>
        </group>
      </xsl:when>
      <xsl:when test="current-group()[self:categoryStart]">
          <xsl:call-template name="do-grouping">
            <xsl:with-param select="current-group()"/>
          </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:copy-of select="current-group()"/>
      </xsl:otherwise>
    </xsl:choose>
    </xsl:for-each-group>
  </xsl:for-each-group>
</xsl:template>

Not tested.

I'm afraid doing a 1.0 solution is pure masochism, so I'll leave that to
others.

Michael Kay
http://www.saxonica.com/
    

> 
> Now, in the destination XML, the categories are also items, 
> which just indicate another level of nesting, and so the 
> above needs to be transformed to something along these lines:
> 
> <item>
>     <title>Item Name One</title>
> </item>
> <group>
>     <title>Category Name One</title>
>     <item>
>         <title>Item Name Two</title>
>     </item>
>     <item>
>         <title>Item Name Three</title>
>     </item>
>     <group>
>             <title>Category Name Two</title>
>             <item>
>                 <item>Item Name Four</item>
>             </item>
>     </group>
>     <item>
>         <title>Item Name Five</title>
>     </item>
> </group>
> <item>
>     <title>Item Name Five</title>
> </item>
> 
> The way I began to approach this was to use a for-each and 
> then a choose, opening the item tag when I found a 
> categoryStart and closing on categoryEnd. But the parser 
> complained about the XML not being well formed, even though 
> it would have been as an end result.
> 
> So next I have tried to use a recursive call-template: something like:
> 
> <xsl:template name="parseCategoryItems">
>     <xsl:param name="nodes" />
>     <xsl:for-each select="$nodes">
>         <xsl:choose>
>             <xsl:when test="name() = 'item'">
>                 <item identifier="ITEM{position()}">
>                     <title><xsl:value-of select="." /></title>
>                 </item>
>             </xsl:when>
>             <xsl:when test="name() = 'categoryStart'">
>                 <item identifier="CITEM{position()}">
>                     <xsl:call-template name="parseCategoryItems">
>                         <xsl:with-param name="nodes"
> select="following-sibling::*[.!=??]" />
>                     </xsl:call-template>
>                 </item>
>             </xsl:when>
>         </xsl:choose>
>     </xsl:for-each>
> </xsl:template>
> 
> All of this is being processed using VBscript in a word 
> document, with version XSLT v1.0.
> 
> First off, I'm not sure how to stop at the correct category 
> end. What I need to do when I recurse is select all the nodes 
> between the current node, and its matching 'endCategory' 
> node. Unfortunately because the source is completely flat, I 
> can't use a normal axis selector. I sort of need to be able 
> to say "select all following siblings *until* we see an 
> endCategory with the same value as the current node". At the 
> moment the best I amanaged was selecting all that were *not* 
> a categoryEnd, which obviously includes those after.
> 
> Secondly, I need to *not* process nodes that have already been done.
> For clarification, when I run what I have now it nests the 
> items (all the following-siblings as I don't know how to 
> select correctly) *and* it prints them again below the nested 
> version. So I basically, is there a way to remove them from 
> the loop I have when you return from the recursive call?
> 
> I've had to simplify the examples from what I really have, 
> but if anyone can give me any hints on how to progress, 
> including completely different approaches, then that would be 
> fantastic, because I am currently out of ideas.
> 
> Many thanks,
> Paul

Current Thread