Re: [xsl] Muenchian Method Doesn't go far enough for me

Subject: Re: [xsl] Muenchian Method Doesn't go far enough for me
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Thu, 17 Oct 2002 08:02:19 -0400
At 2002-10-16 17:15 -0700, Richard Lander wrote:
I'm having some trouble with grouping.

I don't think you have a grouping problem to be solved.


I've moved to the Muenchian method, which makes good sense to me.

A similar problem to mine would be writing a transform to convert flat HTML sections to hiearchical sections.

In HTML, you might have:

H1
H2
H2
H3
H2
H1
H2

Let's remap it to:
<test>
<section>
        <title>H1</title>
</section>
<section>
        <title>H2</title>
</section>

I think at this point you may have made an assumption for your algorithm that won't make sense in the real world of converting a flat HTML file into a set of hierarchical sections.


<xslt:key name="sections" match="section" use="title"/>

You are basing your structure inference on moving the heading level into the remapped sibling sections, but you haven't preserved the original heading. My initial thought was that to go "the next step" in your algorithm to a true HTML file would fall apart if you based your keys on the titles in your sibling section structure.


When I've had to address going from flat HTML to hierarchical sections, the keys approach I've used doesn't use the Muenchian method at all ... it is just another application of keys.

I've copied below a generic solution for recognizing the six heading levels of HTML as sibling elements and inferring structure for the intervening elements between the headings. It also employs a far simpler test for determining when my process has hit the next sibling heading.

Again, I'm not sure if the following will help you or not, but you might have dug yourself quite a hole to get out of with the approach you've taken by "remapping" the input HTML into your sibling section structure. I've beefed up the test data for you to test working with the intervening elements.

I hope this helps.

............. Ken


t:\ftemp>type rich.xml <html> <body> <h1>H1 title 1</h1> <p>P for H1-1</p> <h2>H2 title 1.1</h2> <p>P for H2-1.1</p> <h2>H2 title 1.2</h2> <p>P for H2-1.2</p> <h3>H3 title 1.2.1</h3> <p>P for H3-1.2.1</p> <h4>H4 title 1.2.1.1</h4> <p>P for H4-1.2.1.1</p> <h5>H5 title 1.2.1.1.1</h5> <p>P for H5-1.2.1.1.1</p> <h6>H6 title 1.2.1.1.1.1</h6> <p>P for H6-1.2.1.1.1.1</p> <h2>H2 title 1.3</h2> <p>P for H2-1.3</p> <h1>H1 title 2</h1> <p>P for H1-2</p> <h2>H2 title 2.1</h2> <p>P for H2-2.1</p> </body> </html>

t:\ftemp>type rich.xsl
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                version="1.0">

<xsl:key name="h2s" match="h2" use="generate-id(preceding-sibling::h1[1])"/>
<xsl:key name="h3s" match="h3" use="generate-id(preceding-sibling::h2[1])"/>
<xsl:key name="h4s" match="h4" use="generate-id(preceding-sibling::h3[1])"/>
<xsl:key name="h5s" match="h5" use="generate-id(preceding-sibling::h4[1])"/>
<xsl:key name="h6s" match="h6" use="generate-id(preceding-sibling::h5[1])"/>

<xsl:output indent="yes"/>

<!--infer structure from sibling headings-->
<xsl:template match="body">
<test>
<xsl:for-each select="h1">
<section>
<title><xsl:apply-templates/></title>
<xsl:apply-templates select="following-sibling::*[1]" mode="next"/>
<xsl:for-each select="key('h2s',generate-id(.))">
<section>
<title><xsl:apply-templates/></title>
<xsl:apply-templates select="following-sibling::*[1]" mode="next"/>
<xsl:for-each select="key('h3s',generate-id(.))">
<section>
<title><xsl:apply-templates/></title>
<xsl:apply-templates select="following-sibling::*[1]"
mode="next"/>
<xsl:for-each select="key('h4s',generate-id(.))">
<section>
<title><xsl:apply-templates/></title>
<xsl:apply-templates select="following-sibling::*[1]"
mode="next"/>
<xsl:for-each select="key('h5s',generate-id(.))">
<section>
<title><xsl:apply-templates/></title>
<xsl:apply-templates select="following-sibling::*[1]"
mode="next"/>
<xsl:for-each select="key('h6s',generate-id(.))">
<section>
<title><xsl:apply-templates/></title>
<xsl:apply-templates select="following-sibling::*[1]"
mode="next"/>
</section>
</xsl:for-each>
</section>
</xsl:for-each>
</section>
</xsl:for-each>
</section>
</xsl:for-each>
</section>
</xsl:for-each>
</section>
</xsl:for-each>
</test>
</xsl:template>


<!--process each sibling in order until the next heading level-->

<xsl:template match="*" mode="next">
  <xsl:if test="not( translate( local-name(.),'123456','' ) = 'h' )">
    <xsl:apply-templates select="."/>
    <xsl:apply-templates select="following-sibling::*[1]" mode="next"/>
  </xsl:if>
</xsl:template>

<!--process contents of each node as a typical template rule in unnamed mode-->

<xsl:template match="p">
  <para><xsl:apply-templates/></para>
</xsl:template>

</xsl:stylesheet>

t:\ftemp>saxon -o rich.out rich.xml rich.xsl

t:\ftemp>type rich.out
<?xml version="1.0" encoding="utf-8"?>

<test>
   <section>
      <title>H1 title 1</title>
      <para>P for H1-1</para>
      <section>
         <title>H2 title 1.1</title>
         <para>P for H2-1.1</para>
      </section>
      <section>
         <title>H2 title 1.2</title>
         <para>P for H2-1.2</para>
         <section>
            <title>H3 title 1.2.1</title>
            <para>P for H3-1.2.1</para>
            <section>
               <title>H4 title 1.2.1.1</title>
               <para>P for H4-1.2.1.1</para>
               <section>
                  <title>H5 title 1.2.1.1.1</title>
                  <para>P for H5-1.2.1.1.1</para>
                  <section>
                     <title>H6 title 1.2.1.1.1.1</title>
                     <para>P for H6-1.2.1.1.1.1</para>
                  </section>
               </section>
            </section>
         </section>
      </section>
      <section>
         <title>H2 title 1.3</title>
         <para>P for H2-1.3</para>
      </section>
   </section>
   <section>
      <title>H1 title 2</title>
      <para>P for H1-2</para>
      <section>
         <title>H2 title 2.1</title>
         <para>P for H2-2.1</para>
      </section>
   </section>
</test>

t:\ftemp>rem Done!

--
G. Ken Holman               mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.        http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0  +1(613)489-0999 (F:-0995)
ISBN 0-13-065196-6                     Definitive XSLT and XPath
ISBN 0-13-140374-5                             Definitive XSL-FO
ISBN 1-894049-08-X Practical Transformation Using XSLT and XPath
ISBN 1-894049-10-1             Practical Formatting Using XSL-FO
Next public training:          2002-12-08,2003-02-03,06,03-03,06


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread