RE: [xsl] Re: up-converting

Subject: RE: [xsl] Re: up-converting
From: Jim_Albright@xxxxxxxxxxxx
Date: Tue, 28 Sep 2004 08:07:15 -0400
I have a solution for the up-converting problem that I had. It isn't as 
elegant as I was hoping for. Maybe someone here can give me a few more 
pointers.
Thanks again for including the for each group structure as that makes the 
solution much easier.

My general problem is conversion of a flat XML (WordML) document to one 
with hierarchy.
After tossing out all of the formatting info,
The first step is to map all the paragraphs that indicate divs to their 
appropriate level. I use a, b, c, d, ... for new element names in order to 
make this more generic.
The a, b, c, indicate the head or title for the div.
The div  may  be nested: a, b, c, d.
Some divs may be omitted: a, b,  d.
Divs may be followed by other divs or paragraphs. Paragraphs may contain 
spans.

Next use the for-each-group structure to put a aa element around the a 
elements.
Next use the for-each-group structure to put a bb element around the b 
elements and aaa instead of aa.
...
Each of these steps builds the required hierarchy one step at a time. 
Since some divs may be omitted I couldn't find a way to combine these 
steps.
Next the head/title is pulled out.
Toss out any div with no head/title

sample input
<?xml version="1.0" encoding="UTF-8"?>
<document>
        <a>level aaaa head 1</a>
        <b>level bbbb head 2</b>
        <c>level ccccc head 3</c>
        <dfg>cc 4 blah</dfg>
        <e>level eeee head 5 </e>
        <fhh>cc blah 6</fhh>
        <c>level ccccc head 7</c>
        <df>cc 8 blah<kkk>kkk within df within c</kkk>
        </df>
        <d>level dddd head 9</d>
        <iuo>dd 10 blah</iuo>
        <jtt>dd blah 11</jtt>
        <c>level ccccc head 12</c>
        <df>cc 13 blah</df>
        <e>cc level eeeee  head 14</e>
        <fss>ee blah 15</fss>
        <b>level bbbbb head 16</b>
        <c>level ccccc  head 17</c>
        <df>cc 18  blah</df>
        <e>cc level eeeee head 19</e>
        <fhy>ee blah 20</fhy>
</document>

and the required output is
<?xml version="1.0" encoding="UTF-8"?>
<document>
   <div-a>
      <title>level aaaa head 1</title>
      <div-b>
         <title>level bbbb head 2</title>
         <div-c>
            <title>level ccccc head 3</title>
            <dfg>cc 4 blah</dfg>
            <div-e>
               <title>level eeee head 5 </title>
               <fhh>cc blah 6</fhh>
            </div-e>
         </div-c>
         <div-c>
            <title>level ccccc head 7</title>
            <df>cc 8 blah<kkk>kkk within df within c</kkk>
            </df>
            <div-d>
               <title>level dddd head 9</title>
               <iuo>dd 10 blah</iuo>
               <jtt>dd blah 11</jtt>
            </div-d>
         </div-c>
         <div-c>
            <title>level ccccc head 12</title>
            <df>cc 13 blah</df>
            <div-e>
               <title>cc level eeeee  head 14</title>
               <fss>ee blah 15</fss>
            </div-e>
         </div-c>
      </div-b>
      <div-b>
         <title>level bbbbb head 16</title>
         <div-c>
            <title>level ccccc  head 17</title>
            <df>cc 18  blah</df>
            <div-e>
               <title>cc level eeeee head 19</title>
               <fhy>ee blah 20</fhy>
            </div-e>
         </div-c>
      </div-b>
   </div-a>
</document>



Next use the for-each-group structure to put a aa element around the a 
elements.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
        <xsl:template match="document">
                <document>
                        <xsl:for-each-group select="*" group-starting-with="a">
                                <aa>
                                        <xsl:for-each select="current-group()">
                                                <xsl:copy-of select="."/>
                                        </xsl:for-each>
                                </aa>
                        </xsl:for-each-group>
                </document>
        </xsl:template>
        <xsl:template match="@*|node()" name="copy-current-node">
                <xsl:copy>
                        <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
        </xsl:template>
</xsl:stylesheet>


with output of

<?xml version="1.0" encoding="UTF-8"?>
<document>
   <aa>
      <a>level aaaa head 1</a>
      <b>level bbbb head 2</b>
      <c>level ccccc head 3</c>
      <dfg>cc 4 blah</dfg>
      <e>level eeee head 5 </e>
      <fhh>cc blah 6</fhh>
      <c>level ccccc head 7</c>
      <df>cc 8 blah<kkk>kkk within df within c</kkk>
      </df>
      <d>level dddd head 9</d>
      <iuo>dd 10 blah</iuo>
      <jtt>dd blah 11</jtt>
      <c>level ccccc head 12</c>
      <df>cc 13 blah</df>
      <e>cc level eeeee  head 14</e>
      <fss>ee blah 15</fss>
      <b>level bbbbb head 16</b>
      <c>level ccccc  head 17</c>
      <df>cc 18  blah</df>
      <e>cc level eeeee head 19</e>
      <fhy>ee blah 20</fhy>
   </aa>
</document>

Next use the for-each-group structure to put a bb element around the b 
elements and aaa instead of aa.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
        <xsl:template match="document">
                <document>
                        <xsl:apply-templates/>
                </document>
        </xsl:template>
        <xsl:template match="aa">
                <aaa>
                        <xsl:for-each-group select="*" group-starting-with="b">
                                <bb>
                                        <xsl:for-each select="current-group()">
                                                <xsl:copy-of select="."/>
                                        </xsl:for-each>
                                </bb>
                        </xsl:for-each-group>
                </aaa>
        </xsl:template>
        <xsl:template match="@*|node()" name="copy-current-node">
                <xsl:copy>
                        <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
        </xsl:template>
</xsl:stylesheet>

<?xml version="1.0" encoding="UTF-8"?>
<document>
   <aaa>
      <bb>
         <a>level aaaa head 1</a>
      </bb>
      <bb>
         <b>level bbbb head 2</b>
         <c>level ccccc head 3</c>
         <dfg>cc 4 blah</dfg>
         <e>level eeee head 5 </e>
         <fhh>cc blah 6</fhh>
         <c>level ccccc head 7</c>
         <df>cc 8 blah<kkk>kkk within df within c</kkk>
 
         </df>
         <d>level dddd head 9</d>
         <iuo>dd 10 blah</iuo>
         <jtt>dd blah 11</jtt>
         <c>level ccccc head 12</c>
         <df>cc 13 blah</df>
         <e>cc level eeeee  head 14</e>
         <fss>ee blah 15</fss>
      </bb>
      <bb>
         <b>level bbbbb head 16</b>
         <c>level ccccc  head 17</c>
         <df>cc 18  blah</df>
         <e>cc level eeeee head 19</e>
         <fhy>ee blah 20</fhy>
      </bb>
   </aaa>
</document>


continue adding the levels
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
        <xsl:template match="document">
                <document>
                        <xsl:apply-templates/>
                </document>
        </xsl:template>
        <xsl:template match="aaa">
                <aaa>
                        <xsl:apply-templates/>
                </aaa>
        </xsl:template>
        <xsl:template match="bb">
                <bbb>
                        <xsl:for-each-group select="*" group-starting-with="c">
                                <cc>
                                        <xsl:for-each select="current-group()">
                                                <xsl:copy-of select="."/>
                                        </xsl:for-each>
                                </cc>
                        </xsl:for-each-group>
                </bbb>
        </xsl:template>
        <xsl:template match="@*|node()" name="copy-current-node">
                <xsl:copy>
                        <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
        </xsl:template>
</xsl:stylesheet>

<?xml version="1.0" encoding="UTF-8"?>
<document>
   <aaa>
      <bbb>
         <cc>
            <a>level aaaa head 1</a>
         </cc>
      </bbb>
      <bbb>
         <cc>
            <b>level bbbb head 2</b>
         </cc>
         <cc>
            <c>level ccccc head 3</c>
            <dfg>cc 4 blah</dfg>
            <e>level eeee head 5 </e>
            <fhh>cc blah 6</fhh>
         </cc>
         <cc>
            <c>level ccccc head 7</c>
            <df>cc 8 blah<kkk>kkk within df within c</kkk>
 
 
            </df>
            <d>level dddd head 9</d>
            <iuo>dd 10 blah</iuo>
            <jtt>dd blah 11</jtt>
         </cc>
         <cc>
            <c>level ccccc head 12</c>
            <df>cc 13 blah</df>
            <e>cc level eeeee  head 14</e>
            <fss>ee blah 15</fss>
         </cc>
      </bbb>
      <bbb>
         <cc>
            <b>level bbbbb head 16</b>
         </cc>
         <cc>
            <c>level ccccc  head 17</c>
            <df>cc 18  blah</df>
            <e>cc level eeeee head 19</e>
            <fhy>ee blah 20</fhy>
         </cc>
      </bbb>
   </aaa>
</document>

....


finally at aaa see if there is a descendant a, if so that is the title for 
this group, otherwise no title

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output method="xml" version="1.0" encoding="UTF-8" 
indent="yes"/>
        <xsl:strip-space elements="*"/>
        <xsl:template match="document">
                <document>
                        <xsl:apply-templates/>
                </document>
        </xsl:template>
        <xsl:template match="aaa">
                <div-a>
                        <xsl:choose>
                                <xsl:when test="descendant::a">
                                        <title>
                                                <xsl:apply-templates 
select="descendant::a"/>
                                        </title>
                                        <xsl:apply-templates 
select="child::*"/>
                                </xsl:when>
                                <xsl:otherwise>
                                        <xsl:apply-templates 
select="child::*"/>
                                </xsl:otherwise>
                        </xsl:choose>
                </div-a>
        </xsl:template>
        <xsl:template match="bbb">
                <xsl:choose>
                        <xsl:when test="descendant::b">
                                <div-b>
                                        <title>
                                                <xsl:apply-templates 
select="descendant::b"/>
                                        </title>
                                        <xsl:apply-templates 
select="child::*"/>
                                </div-b>
                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:apply-templates select="child::*"/>
                        </xsl:otherwise>
                </xsl:choose>
        </xsl:template>
        <xsl:template match="ccc">
                <xsl:choose>
                        <xsl:when test="descendant::c">
                                <div-c>
                                        <title>
                                                <xsl:apply-templates 
select="descendant::c"/>
                                        </title>
                                        <xsl:apply-templates 
select="descendant::*[preceding-sibling::c]"/>
                                        <xsl:apply-templates 
select="child::*"/>
                                </div-c>
                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:apply-templates 
select="child::*[not(c)]|descendant::*[preceding-sibling::c]"/>
                        </xsl:otherwise>
                </xsl:choose>
        </xsl:template>
        <xsl:template match="ddd">
                <xsl:choose>
                        <xsl:when test="descendant::d">
                                <div-d>
                                        <title>
                                                <xsl:apply-templates 
select="descendant::d"/>
                                        </title>
                                        <xsl:apply-templates 
select="descendant::*[preceding-sibling::d]"/>
                                        <xsl:apply-templates 
select="child::*"/>
                                </div-d>
                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:apply-templates 
select="child::*|descendant::*[preceding-sibling::d]"/>
                        </xsl:otherwise>
                </xsl:choose>
        </xsl:template>
        <xsl:template match="eee">
                <xsl:choose>
                        <xsl:when test="descendant::e">
                                <div-e>
                                        <title>
                                                <xsl:apply-templates 
select="descendant::e"/>
                                        </title>
                                        <xsl:apply-templates 
select="descendant::*[preceding-sibling::e]"/>
                                        <xsl:apply-templates 
select="child::*"/>
                                </div-e>
                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:apply-templates 
select="child::*|descendant::*[preceding-sibling::e]"/>
                        </xsl:otherwise>
                </xsl:choose>
        </xsl:template>

        <xsl:template match="a|b|c|d|e|f|g|h|i">
                <xsl:apply-templates/>
        </xsl:template>
        <xsl:template match="@*|node()" >
                <xsl:copy>
                        <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
        </xsl:template>
</xsl:stylesheet>

and then we can get rid of divs that have no title. Thus solving the 
missing div problem.



Jim Albright
704 843-0582
Wycliffe Bible Translators

Current Thread