[xsl] Splitting Excel XML tables using XSLT 1.0?

Subject: [xsl] Splitting Excel XML tables using XSLT 1.0?
From: "Scott Lynch" <slynch@xxxxxxxxxx>
Date: Thu, 22 Feb 2007 17:54:38 -0500
I'm a newbie, so bear with me..this is a long email..

I'm working on a project which requires that the XML output of an Excel
spreadsheet be transformed into another data domain using XSLT 1.0 in
Java 5.0. I've transformed the Excel XML into a form which omits the
various style and formatting data (easier to read and work with), and
now have a document which contains many Worksheets which look something
like this:

   <Worksheet ss:Name="Cover">
      <Table>
         <Row>
            <Cell>
               <Data>INSERT CUSTOMER LOGO HERE</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>Succession UA-IP SN09FF</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>Network Specification Document</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>Customer Name</Data>
            </Cell>
            <Cell>
               <Data>Verizon GNS</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>Project Name</Data>
            </Cell>
            <Cell>
               <Data>GNS</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>Project Number</Data>
            </Cell>
            <Cell>
               <Data>H4R379</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>Region</Data>
            </Cell>
            <Cell>
               <Data>North America</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>End cover sheet</Data>
            </Cell>
         </Row>
      </Table>
   </Worksheet>

What I need to do is transform this file again so that some of these
rows are reformatted and some remain untouched. The rows which span

         <Row>
            <Cell>
               <Data>Customer Name</Data>
            </Cell>
            <Cell>
               <Data>Verizon GNS</Data>
            </Cell>
:
:
          <Row>
            <Cell>
               <Data>Region</Data>
            </Cell>
            <Cell>
               <Data>North America</Data>
            </Cell>
         </Row>

are actually in a subtable on the spreadsheet (visual table only since
Excel doesn't understand subtables in it's XML output) and I would like
to add attributes to the Row and Cell nodes for easier second stage
transformation and data queries. Something like the following:

          <Row subtable="Network Specification Parms">
            <Cell type="parm">
               <Data>Region</Data>
            </Cell>
            <Cell type="value">
               <Data>North America</Data>
            </Cell>
         </Row>


I've been able to transform to this format partially for the addition of
the "subtable" attribute, however, I can't seem to find a way to skip
rows which are handled by the recursive template logic so that they are
not duplicated in the final output. Here's the XSL:


  <xsl:template match="ss:Workbook">
      <xsl:copy>
        <xsl:apply-templates/>
      </xsl:copy>
  </xsl:template>

  <xsl:template match="ss:Worksheet">
    <xsl:if test="@ss:Name='Cover'">
      <xsl:copy>
        <xsl:apply-templates select="@*"/>
        <xsl:apply-templates/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>

  <xsl:template match="ss:Table">
    <xsl:copy>
      <xsl:apply-templates select="ss:Row" mode="subtable"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="ss:Row" mode="subtable">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
    <xsl:if test="ss:Cell[ss:Data='Network Specification Document']">
      <xsl:apply-templates select="following-sibling::*[1]"
mode="subtable_label"/>
    </xsl:if>
  </xsl:template>

  <xsl:template match="ss:Row" mode="subtable_label">
    <xsl:if test="ss:Cell[ss:Data!='Region'] and ss:Cell[ss:Data!='North
America']">
      <xsl:copy>
        <xsl:attribute name="subtable">Network Specification
Parms</xsl:attribute>
        <xsl:apply-templates/>
      </xsl:copy>
      <xsl:apply-templates select="following-sibling::*[1]"
mode="subtable_label"/>
    </xsl:if>
  </xsl:template>

  <xsl:template match="ss:Cell">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="ss:Data">
    <xsl:copy>
      <xsl:value-of select="."/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@*">
    <xsl:copy/>
  </xsl:template>


The output looks like this:


   <Worksheet ss:Name="Cover">
      <Table>
         <Row>
            <Cell>
               <Data>INSERT CUSTOMER LOGO HERE</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>Succession UA-IP SN09FF</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>Network Specification Document</Data>
            </Cell>
         </Row>
         <Row subtable="Network Specification Parms">
            <Cell>
               <Data>Customer Name</Data>
            </Cell>
            <Cell>
               <Data>Verizon GNS</Data>
            </Cell>
         </Row>
:
:
         <Row subtable="Network Specification Parms">
            <Cell>
               <Data>Region</Data>
            </Cell>
            <Cell>
               <Data>North America</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>Customer Name</Data>
            </Cell>
            <Cell>
               <Data>Verizon GNS</Data>
            </Cell>
         </Row>
:
:
         <Row>
            <Cell>
               <Data>Region</Data>
            </Cell>
            <Cell>
               <Data>North America</Data>
            </Cell>
         </Row>
         <Row>
            <Cell>
               <Data>End cover sheet</Data>
            </Cell>
            <Cell>
               <Data>EOS004.A08.02.07</Data>
            </Cell>
         </Row>
      </Table>
   </Worksheet>
</Workbook>

The recursive insertion of the "subtable" attribute "works", but I can't
seem to stop the Rows from being replicated without losing the other
rows outside the subtable. Is there some way to modify my XSL to
accomplish this? Is there a mechanism in XSL 2.0 which could provide
better management of ranges of nodes? Unfortunately, the Excel XML
schema is devoid of any meaning other than spreadsheet layout, so I'm
finding it difficult to come up with a mechanism to handle modifications
of subranges of nodes during my copy transform. If possible, I need a
generic mechanism to handle this sort of Row range transform on other
Worksheets as well, using even more column identifiers than the simple
"parm" and "value" labels I'm using here.

thanks and sorry for the long email,

Scott Lynch

Current Thread