[xsl] Transforming flat ?WordML? source to a hierarchical XML output.

Subject: [xsl] Transforming flat ?WordML? source to a hierarchical XML output.
From: David Medley <DAVEMEDLEY@xxxxxxxxxx>
Date: Tue, 11 Sep 2007 15:27:14 +0100
Using following:

Saxon XSLT processor, version 8.9

XSLT 2.0


I am trying to process XML source generated by Microsoft Word (WORDML).

WordML has no concept of hierarchy, and so each paragraph in the source 
looks like below:

        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Normal"/>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Normal Paragraph</w:t>
                </w:r>
        </w:p> 
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="0"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Top Level List</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="0"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Top Level List</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Bulleted"/>
                        <w:listPr>
                                <w:ilvl w:val="1"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 1</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Bulleted"/>
                        <w:listPr>
                                <w:ilvl w:val="1"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 1</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="2"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 2</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="3"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 3</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="4"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 4</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="4"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr>
                                <w:i/>
                        </w:rPr>
                        <w:t>Nested List Level 4</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="5"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr>
                                <w:b/>
                        </w:rPr>
                        <w:t>Nested List Level 5</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="5"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr>
                                <w:u w:val="single"/>
                        </w:rPr>
                        <w:t>Nested List Level 5</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Normal"/>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Normal Paragraph</w:t>
                </w:r>
        </w:p>

This displays in word as follows:

Normal Paragraph
1.      Top Level List
2.      Top Level List
        *       Nested List Level 1
        *       Nested List Level 1
                1.      Nested List Level 2
                        a.      Nested List Level 3
                                i.      Nested List Level 4
                                ii.     Nested List Level 4
                                        1.      Nested List Level 5
                                        2.      Nested List Level 5
Normal Paragraph


I need the outcome to be as follows:

        <Paragraph>Normal Paragraph</Paragraph>
        <List type="numbered">
                <Item>Top Level List</Item>
                <Item>Top Level List
                        <List type="bulleted">
                                <Item>Nested List Level 1</Item>
                                <Item>Nested List Level 1
                                        <List type="numbered">
                                                <Item>Nested List Level 2
                                                        <List type="
numbered">
                                                                <Item>
Nested List Level 3
                                                                        <
List type="numbered">
 <Item>Nested List Level 4</Item>
 <Item>Nested List Level 4
        <List type="numbered">
                <Item>Nested List Level 5</Item>
                <Item>Nested List Level 5</Item>
        </List>
 </Item>
                                                                        </
List>
                                                                </Item>
                                                        </List>
                                                </Item>
                                        </List>
                                </Item>
                        </List>
                </Item>
        </List>
        <Paragraph>Normal Paragraph</Paragraph>


I think what is required is a grouping procedure, grouping the paragraphs 
depending on the value of  x-path 'w:pPr/w:listPr/w:ilvl/@w:val' for each 
paragraph.
My attempt to do this has been unsuccessful resulting in problems of not 
all paragraphs having the x-path 'w:pPr/w:listPr/w:ilvl/@w:val' and 
therefore the grouping falls over.

I hope you can help me in this matter, thank you for reading.


Thank you,
David Medley 
IT Specialist

Application Services, GBS
IBM Office Internal: 299263 External: +44 (0) 1252 55 9263
Mobile: +44 (0) 7790-778801
E-mail: davemedley@xxxxxxxxxx
Notes: David Medley/UK/IBM@IBMGB







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Current Thread