Re: [xsl] Flatten XML to load into Oracle DB

Subject: Re: [xsl] Flatten XML to load into Oracle DB
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 10 Feb 2004 14:16:30 -0500
Hi Max,

It appears that you are flattening your hierarchy with one row for every FROMDATE element (multiple FROMDATE elements can share a single THEKEY parent).

The easiest way to achieve this is to start by "reaching in" and selecting these FROMDATE descendants of the root:

<xsl:template match="/">
  <ROWSET>
    <xsl:apply-templates select="//FROMDATE"/>
  </ROWSET>
</xsl:template>

Having done this, you will need a template matching the FROMDATE elements selected, which generates each row:

<xsl:template match="FROMDATE">
<ROW>
<xsl:attribute name="num">
<xsl:number level="any"/>
<!-- xsl:number is a convenient and robust way of generating the row number -->
</xsl:attribute>
<!-- ... then, more logic ... -->
</ROW>
</xsl:template>


(You could have <ROW num="{position()}"> here but it would be more fragile in the face of future changes to the stylesheet, if that's an issue.)

So far so good -- but how to fill in "more logic"?

At this point there are two general approaches. Since your target element structure is very regular, you could achieve this once again by "pulling" the relevant values from your source. This would be easy enough to do inside the same template as:

      <THEKEY>
        <!-- retrieve the needed value from the parent node, THEKEY -->
        <xsl:value-of select="../text()"/>
      </THEKEY>
      <FROMDATE>
        <!-- likewise, except from the context node, the matched FROMDATE -->
        <xsl:value-of select="text()"/>
      </FROMDATE>
      <type>
        <!-- likewise... -->
        <xsl:value-of select="type/text()"/>
      </type>
   ... etc ...

Alternatively, a somewhat more sophisticated approach would be more easily maintainable and extensible (maybe you want to continue using this stylesheet with other data structures similar to but not identical with what you have here): continue using templates. In this case, using a mode, as your first effort does, is a good idea. So instead of the literal result elements above you'd do:

<xsl:apply-templates
select="parent::THEKEY | self::FROMDATE | type | type/*" mode="flatten"/>


...which selects the parent, the FROMDATE node itself, the "type" child node, and its children, to process them using templates in the "flatten" mode. Note that each of these will be processed in turn, in document order, but not nested inside one another (unless logic in the "flatten" templates you invoke here does some nesting) -- thereby achieving your flattening.

Templates in this mode can simply write out the elements you need as in:

<xsl:template match="THEKEY" mode="flatten">
  <xsl:copy>
    <xsl:value-of select="text()"/>
  </xsl:copy>
</xsl:template>

or even, for the simpler cases,

<xsl:template match="SOMEGROUP" mode="flatten">
  <!-- this node can simply be copied over -->
  <xsl:copy-of select=".">
</xsl:template>

(I won't write them all out, but leave that for you to figure.)

I hope this isn't too cryptic! If it is, please ask about anything you find confusing.

Cheers,
Wendell

At 01:10 PM 2/10/2004, you wrote:
Hello.
I have an XML file like this
<?xml version="1.0" encoding="ISO-8859-1"?>
<procind>F
   <timedate>20021119-08:26:59
      <group>group3
         <THEKEY>001001
            <FROMDATE>19851231
               <type>L
                  <SOMEGROUP>2530</SOMEGROUP>
                  <GIND>253010</GIND>
                  <SECTOR>25</SECTOR>
                  <SUBIND>25301040</SUBIND>
                  <TODATE>19860731</TODATE>
               </type>
            </FROMDATE>
         </THEKEY>
         <THEKEY>001003
            <FROMDATE>19860131
               <type>L
                  <SOMEGROUP>2520</SOMEGROUP>
                  <GIND>252010</GIND>
                  <SECTOR>25</SECTOR>
                  <SUBIND>25201020</SUBIND>
                  <TODATE>19870130</TODATE>
               </type>
            </FROMDATE>
            <FROMDATE>19870131
               <type>L
                  <SOMEGROUP>2550</SOMEGROUP>
                  <GIND>255030</GIND>
                  <SECTOR>25</SECTOR>
                  <SUBIND>25503020</SUBIND>
                  <TODATE>19880130</TODATE>
               </type>
            </FROMDATE>
         </THEKEY>
      </group>
   </timedate>
</procind>

, which I need to load into Oracle DB. It is very easy if I have following XML:

<ROWSET>
   <ROW num="1">
      <THEKEY>001001</THEKEY>
      <FROMDATE>19851231</FROMDATE>
      <type>L</type>
      <SOMEGROUP>2530</SOMEGROUP>
      <GIND>253010</GIND>
      <SECTOR>25</SECTOR>
      <SUBIND>25301040</SUBIND>
      <TODATE>19860731</TODATE>
   </ROW>
   <ROW num="2">
      <THEKEY>001003</THEKEY>
      <FROMDATE>19860131</FROMDATE>
      <type>L</type>
      <SOMEGROUP>2520</SOMEGROUP>
      <GIND>252010</GIND>
      <SECTOR>25</SECTOR>
      <SUBIND>25201020</SUBIND>
      <TODATE>19870130</TODATE>
   </ROW>
   <ROW num="3">
      <THEKEY>001003</THEKEY>
      <FROMDATE>19870131</FROMDATE>
      <type>L</type>
      <SOMEGROUP>2550</SOMEGROUP>
      <GIND>255030</GIND>
      <SECTOR>25</SECTOR>
      <SUBIND>25503020</SUBIND>
      <TODATE>19880130</TODATE>
   </ROW>
</ROWSET>

With XSL

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
<!--xsl:import href="copy.xsl"/-->
<xsl:output method="xml" version="1.0" encoding="UTF-8" omit-xml-declaration="yes" indent="yes"/>


   <xsl:template match="/">
      <ROWSET>
      <xsl:apply-templates select="//THEKEY"/>
      </ROWSET>
   </xsl:template>

<xsl:template match="*" mode="flatten">
<xsl:choose>
<xsl:when test="*">
<xsl:element name="ROW">
<xsl:if test="name()='FROMDATE'">
<xsl:attribute name="num"><xsl:number level="any"/></xsl:attribute>
</xsl:if>
<xsl:apply-templates mode="flatten"/>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>


   <xsl:template match="text()" mode="flatten">
      <xsl:if test="normalize-space(.) != ''">
         <xsl:element name="{name(..)}">
            <xsl:value-of select="normalize-space(.)"/>
         </xsl:element>
      </xsl:if>
   </xsl:template>

   <xsl:template match="THEKEY">
      <xsl:apply-templates mode="flatten"/>
   </xsl:template>

</xsl:stylesheet>

I got

<ROWSET>
    <THEKEY>001001</THEKEY>
    <ROW num="1">
        <FROMDATE>19851231</FROMDATE>
        <ROW>
            <type>I</type>
            <SOMEGROUP>2530</SOMEGROUP>
            <GIND>253010</GIND>
            <SECTOR>25</SECTOR>
            <SUBIND>25301040</SUBIND>
            <TODATE>19860731</TODATE>
        </ROW>
    </ROW>
    <THEKEY>001003</THEKEY>
    <ROW num="2">
        <FROMDATE>19860131</FROMDATE>
        <ROW>
            <type>I</type>
            <SOMEGROUP>2520</SOMEGROUP>
            <GIND>252010</GIND>
            <SECTOR>25</SECTOR>
            <SUBIND>25201020</SUBIND>
            <TODATE>19870130</TODATE>
        </ROW>
    </ROW>
    <ROW num="3">
        <FROMDATE>19870131</FROMDATE>
        <ROW>
            <type>I</type>
            <SOMEGROUP>2550</SOMEGROUP>
            <GIND>255030</GIND>
            <SECTOR>25</SECTOR>
            <SUBIND>25503020</SUBIND>
            <TODATE>19880130</TODATE>
        </ROW>
    </ROW>
</ROWSET>

Can anyone give me a hand with this?


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread