| 
 
Subject: RE: Converting non-pure trees to pure trees From: Kay Michael <Michael.Kay@xxxxxxx> Date: Tue, 21 Nov 2000 10:00:37 -0000  | 
> I have a XML file which I have automatically converted from 
> msword, the basic structure is:
> 
> <worddocument>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<pagebreak/>
> 	<p>2/1</p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<pagebreak/>
> 	<p>2/2</p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> <worddocument/>
This is a grouping problem, of the kind I call "grouping by position".
Grouping problems in XSLT are not easy: for background, see
www.jenitennison.com.
All grouping problems require two nested loops. The outer loop selects a
representative element for each group, which in this case seems to be a <p>
element that is immediately preceded by a <pagebreak> element:
<xsl:for-each select="p[preceding-sibling::*[1][self::pagebreak]">
<mongraph id="{.}">
...
</mongraph>
</xsl:for-each>
Inside this you need an inner loop that processes all the elements within
one group. In this case these are "all the <p> elements that follow the
"representative" element, up to the next "representative" element. Or to put
it another way, all following <p> elements whose first preceding
<page-break> is the same as the first preceding <page-break> of the current
element. 
So the inner loop can be:
<xsl:for-each select="following-sibling::p[
                       generate-id(preceding-sibling::page-break[1]) =
 
generate-id(current()/preceding-sibling::page-break[1])]"
  <xsl:copy-of select="."/>
</xsl:for-each>
In Saxon there is a simpler solution using the saxon:leading() extension
function.
Mike Kay
> 
> I wish to transform this tree using some knowledge I have 
> about the document:
> The first page is always the "introduction", whilst all 
> sebsequent pages are "monographs"
> 
> <semanticdocument>
> 	<introduction>
> 		<p>paragraph <b>hello</b> <i>world</i></p>
> 		<p>paragraph <b>hello</b> <i>world</i></p>
> 		<p>paragraph <b>hello</b> <i>world</i></p>
> 	</introduction>
> 	<mongraphs>
> 		<mongraph id="2/1">
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 		</mongraph id="2/1">
> 		<mongraph id="2/2">
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 		</mongraph>
> 	</mongraphs>
> <semanticdocument/>
> 
> 
 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
| Current Thread | 
|---|
  | 
| <- Previous | Index | Next -> | 
|---|---|---|
| Converting non-pure trees to pure t, Philip Fitzsimons | Thread | RE: xpath not.., Fu, Gwowen | 
| Re: Changing a xsl:param value from, Frédéric SCHWEBEL | Date | RE: using ancestorChildNumber in VB, Kay Michael | 
| Month |