Re: [xsl] Transforming flat ?WordML? source to a hierarchical XML output.

Subject: Re: [xsl] Transforming flat ?WordML? source to a hierarchical XML output.
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 11 Sep 2007 11:40:25 -0400
David,

If you show us your code (reduced to an illustration please) showing what you've tried, it will be easier to help.

From what we can see, it appears your diagnosis could be correct. If you're using group-adjacent="w:pPr/w:listPr/w:ilvl/@w:val", you could try "(w:pPr/w:listPr/w:ilvl/@w:val,'0')[1]", which would provide '0' as a grouping key value for w:p elements that return nothing from that XPath.

Hm: w:p, nice element name.

Cheers,
Wendell Piez

At 10:27 AM 9/11/2007, you wrote:
Using following:

Saxon XSLT processor, version 8.9

XSLT 2.0


I am trying to process XML source generated by Microsoft Word (WORDML).


WordML has no concept of hierarchy, and so each paragraph in the source
looks like below:

        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Normal"/>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Normal Paragraph</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="0"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Top Level List</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="0"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Top Level List</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Bulleted"/>
                        <w:listPr>
                                <w:ilvl w:val="1"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 1</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Bulleted"/>
                        <w:listPr>
                                <w:ilvl w:val="1"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 1</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="2"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 2</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="3"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 3</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="4"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 4</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="4"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr>
                                <w:i/>
                        </w:rPr>
                        <w:t>Nested List Level 4</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="5"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr>
                                <w:b/>
                        </w:rPr>
                        <w:t>Nested List Level 5</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="5"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr>
                                <w:u w:val="single"/>
                        </w:rPr>
                        <w:t>Nested List Level 5</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Normal"/>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Normal Paragraph</w:t>
                </w:r>
        </w:p>

This displays in word as follows:

Normal Paragraph
1.      Top Level List
2.      Top Level List
        *       Nested List Level 1
        *       Nested List Level 1
                1.      Nested List Level 2
                        a.      Nested List Level 3
                                i.      Nested List Level 4
                                ii.     Nested List Level 4
                                        1.      Nested List Level 5
                                        2.      Nested List Level 5
Normal Paragraph


I need the outcome to be as follows:


        <Paragraph>Normal Paragraph</Paragraph>
        <List type="numbered">
                <Item>Top Level List</Item>
                <Item>Top Level List
                        <List type="bulleted">
                                <Item>Nested List Level 1</Item>
                                <Item>Nested List Level 1
                                        <List type="numbered">
                                                <Item>Nested List Level 2
                                                        <List type="
numbered">
                                                                <Item>
Nested List Level 3
                                                                        <
List type="numbered">
 <Item>Nested List Level 4</Item>
 <Item>Nested List Level 4
        <List type="numbered">
                <Item>Nested List Level 5</Item>
                <Item>Nested List Level 5</Item>
        </List>
 </Item>
                                                                        </
List>
                                                                </Item>
                                                        </List>
                                                </Item>
                                        </List>
                                </Item>
                        </List>
                </Item>
        </List>
        <Paragraph>Normal Paragraph</Paragraph>


I think what is required is a grouping procedure, grouping the paragraphs depending on the value of x-path 'w:pPr/w:listPr/w:ilvl/@w:val' for each paragraph. My attempt to do this has been unsuccessful resulting in problems of not all paragraphs having the x-path 'w:pPr/w:listPr/w:ilvl/@w:val' and therefore the grouping falls over.

I hope you can help me in this matter, thank you for reading.


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread