[xsl] breaking up a big xml file into smaller xml files

Subject: [xsl] breaking up a big xml file into smaller xml files
From: "Don Stinchfield" <des@xxxxxxxx>
Date: Thu, 18 Mar 2004 10:49:48 -0500
Hello,

Can someone help me to create the correct XSLT for my problem?  Details
follow...

I have a large XML file that I am trying to break up into smaller xml files.
I access parts of the large XML file using document().  I index into the
large xml file with the following xml file.
 
<?xml version="1.0" encoding="UTF-8" ?> 
- <toc>
-   <chapter id="1" name="Overview" author="rpo">
      <section name="Mission Statement" author="rpo" /> 
-     <section name="Around Campus" author="rpo">
        <subsection name="Students and Faculty" author="rpo" /> 
        <subsection name="The Campus" author="rtdavis" /> 
        <subsection name="Infinite Corridor" author="rpo" /> 
      </section>
-     <section name="Academic Program" author="lboyd">
        <subsection name="Accreditation" author="rpo" /> 
      </section>
-     <section name="Administrative Organization" author="rpo">
        <subsection name="The Corporation" author="slester" /> 
        <subsection name="Academic Departments and Divisions" author="lboyd"
/> 
        <subsection name="Faculty" author="lub" /> 
        <subsection name="Research" author="lsnover" /> 
        <subsection name="Association of Alumni and Alumnae" author="maggyb"
/> 
      </section>
    </chapter>
  </toc>

Using document() I extract the data from the large XML file and store it in
a new file using saxon:output.  Everything's straightforward so far.  Here's
where things start to get complicated.  Each output xml file will contain a
sequence of sections from the large XML file based on author and unbroken
sequence of nodes.  Sorry, difficult to explain.  Here's an example of the
output I'm trying to get.  Given the above listing the result of processing
should be 10 XML files.

Output file 1 contains the following sections:
  <chapter id="1" name="Overview" author="rpo">
      <section name="Mission Statement" author="rpo" /> 
-     <section name="Around Campus" author="rpo">
        <subsection name="Students and Faculty" author="rpo" />

Output file 2 contains:
        <subsection name="Students and Faculty" author="rpo" />

Output file 3:
        <subsection name="The Campus" author="rtdavis" /> 

And so on.

Note, output file 5 contains:
        <subsection name="Accreditation" author="rpo" /> 
      </section>
-     <section name="Administrative Organization" author="rpo">

As you can see from this last example the sections used to create the output
xml isn't necessarily from the descendent axis.  The only restriction is
that an output file can only contain sections from within a chapter.  

The below listed XSLT isn't even coming close to getting the job done.  I
thought using recursion to create the output xml might be a good way to walk
the tree.  Can't get it to work.  

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
	<xsl:output method="xml" version="1.0" encoding="UTF-8"
indent="yes"/>
	
<xsl:template match="/">
  <xsl:apply-templates select="//chapter"/>
</xsl:template>

<xsl:template match="chapter">
  <xsl:call-template name="aggregate"/>
</xsl:template>

<xsl:template name="aggregate">
  <xsl:call-template name="recurse">
    <xsl:with-param name="nodes" select="descendant-or-self::*"/>
    <xsl:with-param name="author" select="@author"/>
  </xsl:call-template>
</xsl:template>

<xsl:template name="recurse">
  <xsl:param name="nodes"/>
  <xsl:param name="author"/>
  <xsl:if test="$nodes[1]/@author=$author">
<xsl:copy-of
select="document('file:///d:/TESTING/xslt/data.xml')/html/body/div/*"/>
      <xsl:call-template name="recurse">
        <xsl:with-param name="nodes" select="$nodes[position() != 1]"/>
        <xsl:with-param name="author" select="$author"/>
      </xsl:call-template>
  </xsl:if>
</xsl:template>
</xsl:stylesheet>


Finally, here's the data.xml file.

  <?xml version="1.0" encoding="UTF-8" ?> 
- <html>
  - <head>
      <title>This is MIT</title> 
    </head>
-   <body>
-     <div name="section">
      - <div name="heading">
          <h1>Overview</h1> 
        </div>
    -   <div name="content">
          <p>On February 20, 1865...</p> 
        </div>
      </div>
    - <div name="section">
      - <div name="heading">
          <h2>Around Campus</h2> 
        </div>
      - <div name="content">
          <p>The 1998 Task Force on Student Life...</p> 
        </div>
      </div>
    - <div name="section">
      - <div name="heading">
          <h3>Students and Faculty</h3> 
        </div>
      - <div name="content">
          <p>The confluence of ages...</p> 
        </div>
      </div>
    - <div name="section">
      - <div name="heading">
          <h3>The Campus</h3> 
        </div>
      - <div name="content">
          <p>The world...</p> 
        </div>
      </div>
    - <div name="section">
      - <div name="heading">
          <h3>Infinite Corridor</h3> 
       </div>
     - <div name="content">
         <p>For most undergraduates...</p> 
       </div>
     </div>
   - <div name="section">
     - <div name="heading">
         <h3>Accreditation</h3> 
       </div>
     - <div name="content">
         <p>Many degree programs...</p> 
       </div>
     </div>
   - <div name="section">
     - <div name="heading">
         <h2>Administrative Organization</h2> 
       </div>
       <div name="content" />
     </div>
-    <div name="section">
     - <div name="heading">
         <h3>The Corporation</h3> 
       </div>
     - <div name="content">
         <p>The Institute's board of trustees...</p> 
       </div>
     </div>
-    <div name="section">
     - <div name="heading">
         <h3>Faculty</h3> 
       </div>
     - <div name="content">
         <p>Educational policy for the Institute...</p> 
       </div>
     </div>
   - <div name="section">
-      <div name="heading">
         <h3>Research</h3> 
       </div>
     - <div name="content">
         <p>The Office of Sponsored Programs...</p> 
       </div>
     </div>
   - <div name="section">
     - <div name="heading">
         <h3>Association of Alumni and Alumnae</h3> 
       </div>
     - <div name="content">
         <p>The Association of Alumni and Alumnae...</p> 
       </div>
     </div>
   </body>
 </html>

Any help would be greatly appreciated...

-don 




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread