RE: [xsl] Transforming flat ?WordML? source to a hierarchical XML output.

Subject: RE: [xsl] Transforming flat ?WordML? source to a hierarchical XML output.
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 12 Sep 2007 11:38:59 +0100
There's an example of XSLT 2.0 code for converting a hierarchy expressed as
a flat structure with level numbers into a real XML hierarchy at

http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.html

Michael Kay
http://www.saxonica.com/
 

> -----Original Message-----
> From: David Medley [mailto:DAVEMEDLEY@xxxxxxxxxx] 
> Sent: 11 September 2007 15:27
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Transforming flat ?WordML? source to a 
> hierarchical XML output.
> 
> Using following:
> 
> Saxon XSLT processor, version 8.9
> 
> XSLT 2.0
> 
> 
> I am trying to process XML source generated by Microsoft Word 
> (WORDML).
> 
> WordML has no concept of hierarchy, and so each paragraph in 
> the source looks like below:
> 
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Normal"/>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr/>
>                         <w:t>Normal Paragraph</w:t>
>                 </w:r>
>         </w:p> 
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Number"/>
>                         <w:listPr>
>                                 <w:ilvl w:val="0"/>
>                         </w:listPr>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr/>
>                         <w:t>Top Level List</w:t>
>                 </w:r>
>         </w:p>
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Number"/>
>                         <w:listPr>
>                                 <w:ilvl w:val="0"/>
>                         </w:listPr>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr/>
>                         <w:t>Top Level List</w:t>
>                 </w:r>
>         </w:p>
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Bulleted"/>
>                         <w:listPr>
>                                 <w:ilvl w:val="1"/>
>                         </w:listPr>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr/>
>                         <w:t>Nested List Level 1</w:t>
>                 </w:r>
>         </w:p>
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Bulleted"/>
>                         <w:listPr>
>                                 <w:ilvl w:val="1"/>
>                         </w:listPr>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr/>
>                         <w:t>Nested List Level 1</w:t>
>                 </w:r>
>         </w:p>
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Number"/>
>                         <w:listPr>
>                                 <w:ilvl w:val="2"/>
>                         </w:listPr>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr/>
>                         <w:t>Nested List Level 2</w:t>
>                 </w:r>
>         </w:p>
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Number"/>
>                         <w:listPr>
>                                 <w:ilvl w:val="3"/>
>                         </w:listPr>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr/>
>                         <w:t>Nested List Level 3</w:t>
>                 </w:r>
>         </w:p>
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Number"/>
>                         <w:listPr>
>                                 <w:ilvl w:val="4"/>
>                         </w:listPr>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr/>
>                         <w:t>Nested List Level 4</w:t>
>                 </w:r>
>         </w:p>
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Number"/>
>                         <w:listPr>
>                                 <w:ilvl w:val="4"/>
>                         </w:listPr>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr>
>                                 <w:i/>
>                         </w:rPr>
>                         <w:t>Nested List Level 4</w:t>
>                 </w:r>
>         </w:p>
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Number"/>
>                         <w:listPr>
>                                 <w:ilvl w:val="5"/>
>                         </w:listPr>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr>
>                                 <w:b/>
>                         </w:rPr>
>                         <w:t>Nested List Level 5</w:t>
>                 </w:r>
>         </w:p>
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Number"/>
>                         <w:listPr>
>                                 <w:ilvl w:val="5"/>
>                         </w:listPr>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr>
>                                 <w:u w:val="single"/>
>                         </w:rPr>
>                         <w:t>Nested List Level 5</w:t>
>                 </w:r>
>         </w:p>
>         <w:p>
>                 <w:pPr>
>                         <w:pStyle w:val="Normal"/>
>                 </w:pPr>
>                 <w:r>
>                         <w:rPr/>
>                         <w:t>Normal Paragraph</w:t>
>                 </w:r>
>         </w:p>
> 
> This displays in word as follows:
> 
> Normal Paragraph
> 1.      Top Level List
> 2.      Top Level List
>         *       Nested List Level 1
>         *       Nested List Level 1
>                 1.      Nested List Level 2
>                         a.      Nested List Level 3
>                                 i.      Nested List Level 4
>                                 ii.     Nested List Level 4
>                                         1.      Nested List Level 5
>                                         2.      Nested List Level 5
> Normal Paragraph
> 
> 
> I need the outcome to be as follows:
> 
>         <Paragraph>Normal Paragraph</Paragraph>
>         <List type="numbered">
>                 <Item>Top Level List</Item>
>                 <Item>Top Level List
>                         <List type="bulleted">
>                                 <Item>Nested List Level 1</Item>
>                                 <Item>Nested List Level 1
>                                         <List type="numbered">
>                                                 <Item>Nested 
> List Level 2
>                                                         <List type="
> numbered">
>                                                               
>   <Item> Nested List Level 3
>                                                               
>           < List type="numbered">  <Item>Nested List Level 
> 4</Item>  <Item>Nested List Level 4
>         <List type="numbered">
>                 <Item>Nested List Level 5</Item>
>                 <Item>Nested List Level 5</Item>
>         </List>
>  </Item>
>                                                               
>           </
> List>
>                                                               
>   </Item>
>                                                         </List>
>                                                 </Item>
>                                         </List>
>                                 </Item>
>                         </List>
>                 </Item>
>         </List>
>         <Paragraph>Normal Paragraph</Paragraph>
> 
> 
> I think what is required is a grouping procedure, grouping 
> the paragraphs depending on the value of  x-path 
> 'w:pPr/w:listPr/w:ilvl/@w:val' for each paragraph.
> My attempt to do this has been unsuccessful resulting in 
> problems of not all paragraphs having the x-path 
> 'w:pPr/w:listPr/w:ilvl/@w:val' and therefore the grouping falls over.
> 
> I hope you can help me in this matter, thank you for reading.
> 
> 
> Thank you,
> David Medley
> IT Specialist
> 
> Application Services, GBS
> IBM Office Internal: 299263 External: +44 (0) 1252 55 9263
> Mobile: +44 (0) 7790-778801
> E-mail: davemedley@xxxxxxxxxx
> Notes: David Medley/UK/IBM@IBMGB
> 
> 
> 
> 
> 
> 
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales 
> with number 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, 
> Hampshire PO6 3AU

Current Thread