Re: Transforming an incorrectly structured document...

Subject: Re: Transforming an incorrectly structured document...
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Mon, 04 Dec 2000 14:20:18 +0000
Ben--

Since this is becoming a FAQ (how to get structure out of a flat document), I thought I'd post my general levitating-a-flat-structure stylesheet. Note: its design supports only regular structures (that is, flat files whose nesting is "correct" although only implicit); if you need a more general-purpose solution, one could be based on the same principles (the key declarations would have to be modified).

Getting the "formatting" (that is, the transformation of element types) along with the conversion (the levitation) is not hard, only a matter of adapting and supplementing these templates.

I hope it helps --
Wendell

INPUT FILE:
<levels>
<h1>Header 1</h1>
<stuff>Content under 1</stuff>
<h2>header 1.1</h2>
<stuff>Content under 1.1</stuff>
<stuff>More content under 1.1</stuff>
<h2>header 1.2</h2>
<stuff>Content under 1.2</stuff>
<h2>header 1.3</h2>
<stuff>Content under 1.3</stuff>
<h3>header 1.3.1</h3>
<stuff>Content under 1.3.1</stuff>
<h4>header 1.3.1.1</h4>
<stuff>Content under 1.3.1.1</stuff>
<h5>header 1.3.1.1.1</h5>
<stuff>Content under 1.3.1.1.1</stuff>
<h1>Header 2</h1>
<stuff>Content under 2</stuff>
<h2>header 2.1</h2>
<stuff>Content under 2.1 </stuff>
<h2>header 2.2</h2>
<stuff>Content under 2.2</stuff>
<h3>header 2.2.1</h3>
<stuff>Content under 2.2.1</stuff>
<h2>header 2.3</h2>
<stuff>Content under 2.3</stuff>
</levels>

STYLESHEET:
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:output method="xml" indent="yes"/>

<!-- this key should match any 'stuff-level' elements you have such
     as paragraphs, lists etc. -->
<xsl:key name='stuffchildren' match="stuff"
  use="generate-id((..|preceding-sibling::h1|preceding-sibling::h2|preceding-sibling::h3|preceding-sibling::h4|preceding-sibling::h5)[last()])"/>

<xsl:key name="h2children" match="h2"
  use="generate-id(preceding-sibling::h1[1])"/>

<xsl:key name="h3children" match="h3"
  use="generate-id(preceding-sibling::h2[1])"/>

<xsl:key name="h4children" match="h4"
  use="generate-id(preceding-sibling::h3[1])"/>

<xsl:key name="h5children" match="h5"
  use="generate-id(preceding-sibling::h4[1])"/>

<xsl:template match="levels">
  <xsl:apply-templates select="key('stuffchildren', generate-id())"/>
  <xsl:apply-templates select="h1"/>
</xsl:template>

<xsl:template match="h1">
  <section level="1">
    <head>
      <xsl:apply-templates/>
    </head>
    <xsl:apply-templates select="key('stuffchildren', generate-id())"/>
    <xsl:apply-templates select="key('h2children', generate-id())"/>
  </section>
</xsl:template>

<xsl:template match="h2">
  <section level="2">
    <head>
      <xsl:apply-templates/>
    </head>
    <xsl:apply-templates select="key('stuffchildren', generate-id())"/>
    <xsl:apply-templates select="key('h3children', generate-id())"/>
  </section>
</xsl:template>

<xsl:template match="h3">
  <section level="3">
    <head>
      <xsl:apply-templates/>
    </head>
    <xsl:apply-templates select="key('stuffchildren', generate-id())"/>
    <xsl:apply-templates select="key('h4children', generate-id())"/>
  </section>
</xsl:template>

<xsl:template match="h4">
  <section level="4">
    <head>
      <xsl:apply-templates/>
    </head>
    <xsl:apply-templates select="key('stuffchildren', generate-id())"/>
    <xsl:apply-templates select="key('h5children', generate-id())"/>
  </section>
</xsl:template>

<xsl:template match="h5">
  <section level="5">
    <head>
      <xsl:apply-templates/>
    </head>
    <xsl:apply-templates select="key('stuffchildren', generate-id())"/>
  </section>
</xsl:template>

<xsl:template match="stuff">
  <data><xsl:apply-templates/></data>
</xsl:template>

</xsl:stylesheet>

RESULT (using SAXON):
<?xml version="1.0" encoding="utf-8"?>
<section level="1">
   <head>Header 1</head>
   <data>Content under 1</data>
   <section level="2">
      <head>header 1.1</head>
      <data>Content under 1.1</data>
      <data>More content under 1.1</data>
   </section>
   <section level="2">
      <head>header 1.2</head>
      <data>Content under 1.2</data>
   </section>
   <section level="2">
      <head>header 1.3</head>
      <data>Content under 1.3</data>
      <section level="3">
         <head>header 1.3.1</head>
         <data>Content under 1.3.1</data>
         <section level="4">
            <head>header 1.3.1.1</head>
            <data>Content under 1.3.1.1</data>
            <section level="5">
               <head>header 1.3.1.1.1</head>
               <data>Content under 1.3.1.1.1</data>
            </section>
         </section>
      </section>
   </section>
</section>
<section level="1">
   <head>Header 2</head>
   <data>Content under 2</data>
   <section level="2">
      <head>header 2.1</head>
      <data>Content under 2.1 </data>
   </section>
   <section level="2">
      <head>header 2.2</head>
      <data>Content under 2.2</data>
      <section level="3">
         <head>header 2.2.1</head>
         <data>Content under 2.2.1</data>
      </section>   </section>
   <section level="2">
      <head>header 2.3</head>
      <data>Content under 2.3</data>
   </section>
</section>

Hope that helps!

At 01:08 PM 12/1/00 -0800, you wrote:

...The XML is as follows:

<div>
<p />
<h1>Overview</h1>
<p>Use Slater guards when a riser is subject to minor damage, such as in a
walkway. Use split casings and/or protection posts when a riser is subject to
heavy damage, such as in a</p>
<h1>Factors</h1>
<h2>What to look for</h2>
<p>lane or docking bay.</p>
<p>Slater guards and split casings come in 1 meter lengths, but may be cut or
joined as necessary.</p>
<p>Order the split casing attaching lugs separately, and weld them on.</p>


The ideal would be to structure the document as follows:

<div><h1>Overview</h1><p>Use Slater guards when a riser is subject to minor
damage, such as in a walkway. Use split casings and/or protection posts when a
riser is subject to heavy damage, such as in a</p></div>

<div><h1>Factors</h1><h2>What to look for</h2><p>lane or docking
bay.</p><p>Slater guards and split casings come in 1 meter lengths, but may be
cut or joined as necessary.</p><p>Order the split casing attaching lugs
separately, and weld them on.</p></div>
...

======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread