RE: [xsl] Re: up-converting

Subject: RE: [xsl] Re: up-converting
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 28 Sep 2004 14:23:33 +0100
I haven't looked through your code in detail, but it looks similar to a
problem I used as an exercise at the Oxford Summer School. Here we had a set
of records with COBOL-like level numbers

<A level="1"/>
<B level="2"/>
<C level="3"/>
<D level="2"/>

and the task is to create a hierarchically nested structure. (The actual
input was a GEDCOM file).

the solution is a recursive grouping like this:

<xsl:template name="g">
 <xsl:param name="sequence" as="element()*"/>
 <xsl:param name="level" as="xs:integer"/>
 <xsl:for-each-group select="$sequence"
group-starting-with="*[@level=$level]">
  <xsl:copy>
    <xsl:call-template name="g">
      <xsl:with-param name="sequence" select="current-group() except ."/>
      <xsl:with-param name="level" select="$level+1"/>
    </
  </
 </
</

Now it seems to me your problem is very similar, except you have no explicit
level number. But I think you could use a similar approach, where the same
template is used for each level of grouping and the only thing that changes
is the grouping key.

Michael Kay
http://www.saxonica.com/
 

> -----Original Message-----
> From: Jim_Albright@xxxxxxxxxxxx [mailto:Jim_Albright@xxxxxxxxxxxx] 
> Sent: 28 September 2004 13:07
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: RE: [xsl] Re: up-converting
> 
> I have a solution for the up-converting problem that I had. 
> It isn't as 
> elegant as I was hoping for. Maybe someone here can give me a 
> few more 
> pointers.
> Thanks again for including the for each group structure as 
> that makes the 
> solution much easier.
> 
> My general problem is conversion of a flat XML (WordML) 
> document to one 
> with hierarchy.
> After tossing out all of the formatting info,
> The first step is to map all the paragraphs that indicate 
> divs to their 
> appropriate level. I use a, b, c, d, ... for new element 
> names in order to 
> make this more generic.
> The a, b, c, indicate the head or title for the div.
> The div  may  be nested: a, b, c, d.
> Some divs may be omitted: a, b,  d.
> Divs may be followed by other divs or paragraphs. Paragraphs 
> may contain 
> spans.
> 
> Next use the for-each-group structure to put a aa element 
> around the a 
> elements.
> Next use the for-each-group structure to put a bb element 
> around the b 
> elements and aaa instead of aa.
> ...
> Each of these steps builds the required hierarchy one step at a time. 
> Since some divs may be omitted I couldn't find a way to combine these 
> steps.
> Next the head/title is pulled out.
> Toss out any div with no head/title
> 
> sample input
> <?xml version="1.0" encoding="UTF-8"?>
> <document>
>         <a>level aaaa head 1</a>
>         <b>level bbbb head 2</b>
>         <c>level ccccc head 3</c>
>         <dfg>cc 4 blah</dfg>
>         <e>level eeee head 5 </e>
>         <fhh>cc blah 6</fhh>
>         <c>level ccccc head 7</c>
>         <df>cc 8 blah<kkk>kkk within df within c</kkk>
>         </df>
>         <d>level dddd head 9</d>
>         <iuo>dd 10 blah</iuo>
>         <jtt>dd blah 11</jtt>
>         <c>level ccccc head 12</c>
>         <df>cc 13 blah</df>
>         <e>cc level eeeee  head 14</e>
>         <fss>ee blah 15</fss>
>         <b>level bbbbb head 16</b>
>         <c>level ccccc  head 17</c>
>         <df>cc 18  blah</df>
>         <e>cc level eeeee head 19</e>
>         <fhy>ee blah 20</fhy>
> </document>
> 
> and the required output is
> <?xml version="1.0" encoding="UTF-8"?>
> <document>
>    <div-a>
>       <title>level aaaa head 1</title>
>       <div-b>
>          <title>level bbbb head 2</title>
>          <div-c>
>             <title>level ccccc head 3</title>
>             <dfg>cc 4 blah</dfg>
>             <div-e>
>                <title>level eeee head 5 </title>
>                <fhh>cc blah 6</fhh>
>             </div-e>
>          </div-c>
>          <div-c>
>             <title>level ccccc head 7</title>
>             <df>cc 8 blah<kkk>kkk within df within c</kkk>
>             </df>
>             <div-d>
>                <title>level dddd head 9</title>
>                <iuo>dd 10 blah</iuo>
>                <jtt>dd blah 11</jtt>
>             </div-d>
>          </div-c>
>          <div-c>
>             <title>level ccccc head 12</title>
>             <df>cc 13 blah</df>
>             <div-e>
>                <title>cc level eeeee  head 14</title>
>                <fss>ee blah 15</fss>
>             </div-e>
>          </div-c>
>       </div-b>
>       <div-b>
>          <title>level bbbbb head 16</title>
>          <div-c>
>             <title>level ccccc  head 17</title>
>             <df>cc 18  blah</df>
>             <div-e>
>                <title>cc level eeeee head 19</title>
>                <fhy>ee blah 20</fhy>
>             </div-e>
>          </div-c>
>       </div-b>
>    </div-a>
> </document>
> 
> 
> 
> Next use the for-each-group structure to put a aa element 
> around the a 
> elements.
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>         <xsl:output method="xml" version="1.0" 
> encoding="UTF-8" indent="yes"/>
>         <xsl:template match="document">
>                 <document>
>                         <xsl:for-each-group select="*" 
> group-starting-with="a">
>                                 <aa>
>                                         <xsl:for-each 
> select="current-group()">
>                                                 <xsl:copy-of 
> select="."/>
>                                         </xsl:for-each>
>                                 </aa>
>                         </xsl:for-each-group>
>                 </document>
>         </xsl:template>
>         <xsl:template match="@*|node()" name="copy-current-node">
>                 <xsl:copy>
>                         <xsl:apply-templates select="@*|node()"/>
>                 </xsl:copy>
>         </xsl:template>
> </xsl:stylesheet>
> 
> 
> with output of
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <document>
>    <aa>
>       <a>level aaaa head 1</a>
>       <b>level bbbb head 2</b>
>       <c>level ccccc head 3</c>
>       <dfg>cc 4 blah</dfg>
>       <e>level eeee head 5 </e>
>       <fhh>cc blah 6</fhh>
>       <c>level ccccc head 7</c>
>       <df>cc 8 blah<kkk>kkk within df within c</kkk>
>       </df>
>       <d>level dddd head 9</d>
>       <iuo>dd 10 blah</iuo>
>       <jtt>dd blah 11</jtt>
>       <c>level ccccc head 12</c>
>       <df>cc 13 blah</df>
>       <e>cc level eeeee  head 14</e>
>       <fss>ee blah 15</fss>
>       <b>level bbbbb head 16</b>
>       <c>level ccccc  head 17</c>
>       <df>cc 18  blah</df>
>       <e>cc level eeeee head 19</e>
>       <fhy>ee blah 20</fhy>
>    </aa>
> </document>
> 
> Next use the for-each-group structure to put a bb element 
> around the b 
> elements and aaa instead of aa.
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>         <xsl:output method="xml" version="1.0" 
> encoding="UTF-8" indent="yes"/>
>         <xsl:template match="document">
>                 <document>
>                         <xsl:apply-templates/>
>                 </document>
>         </xsl:template>
>         <xsl:template match="aa">
>                 <aaa>
>                         <xsl:for-each-group select="*" 
> group-starting-with="b">
>                                 <bb>
>                                         <xsl:for-each 
> select="current-group()">
>                                                 <xsl:copy-of 
> select="."/>
>                                         </xsl:for-each>
>                                 </bb>
>                         </xsl:for-each-group>
>                 </aaa>
>         </xsl:template>
>         <xsl:template match="@*|node()" name="copy-current-node">
>                 <xsl:copy>
>                         <xsl:apply-templates select="@*|node()"/>
>                 </xsl:copy>
>         </xsl:template>
> </xsl:stylesheet>
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <document>
>    <aaa>
>       <bb>
>          <a>level aaaa head 1</a>
>       </bb>
>       <bb>
>          <b>level bbbb head 2</b>
>          <c>level ccccc head 3</c>
>          <dfg>cc 4 blah</dfg>
>          <e>level eeee head 5 </e>
>          <fhh>cc blah 6</fhh>
>          <c>level ccccc head 7</c>
>          <df>cc 8 blah<kkk>kkk within df within c</kkk>
>  
>          </df>
>          <d>level dddd head 9</d>
>          <iuo>dd 10 blah</iuo>
>          <jtt>dd blah 11</jtt>
>          <c>level ccccc head 12</c>
>          <df>cc 13 blah</df>
>          <e>cc level eeeee  head 14</e>
>          <fss>ee blah 15</fss>
>       </bb>
>       <bb>
>          <b>level bbbbb head 16</b>
>          <c>level ccccc  head 17</c>
>          <df>cc 18  blah</df>
>          <e>cc level eeeee head 19</e>
>          <fhy>ee blah 20</fhy>
>       </bb>
>    </aaa>
> </document>
> 
> 
> continue adding the levels
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>         <xsl:output method="xml" version="1.0" 
> encoding="UTF-8" indent="yes"/>
>         <xsl:template match="document">
>                 <document>
>                         <xsl:apply-templates/>
>                 </document>
>         </xsl:template>
>         <xsl:template match="aaa">
>                 <aaa>
>                         <xsl:apply-templates/>
>                 </aaa>
>         </xsl:template>
>         <xsl:template match="bb">
>                 <bbb>
>                         <xsl:for-each-group select="*" 
> group-starting-with="c">
>                                 <cc>
>                                         <xsl:for-each 
> select="current-group()">
>                                                 <xsl:copy-of 
> select="."/>
>                                         </xsl:for-each>
>                                 </cc>
>                         </xsl:for-each-group>
>                 </bbb>
>         </xsl:template>
>         <xsl:template match="@*|node()" name="copy-current-node">
>                 <xsl:copy>
>                         <xsl:apply-templates select="@*|node()"/>
>                 </xsl:copy>
>         </xsl:template>
> </xsl:stylesheet>
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <document>
>    <aaa>
>       <bbb>
>          <cc>
>             <a>level aaaa head 1</a>
>          </cc>
>       </bbb>
>       <bbb>
>          <cc>
>             <b>level bbbb head 2</b>
>          </cc>
>          <cc>
>             <c>level ccccc head 3</c>
>             <dfg>cc 4 blah</dfg>
>             <e>level eeee head 5 </e>
>             <fhh>cc blah 6</fhh>
>          </cc>
>          <cc>
>             <c>level ccccc head 7</c>
>             <df>cc 8 blah<kkk>kkk within df within c</kkk>
>  
>  
>             </df>
>             <d>level dddd head 9</d>
>             <iuo>dd 10 blah</iuo>
>             <jtt>dd blah 11</jtt>
>          </cc>
>          <cc>
>             <c>level ccccc head 12</c>
>             <df>cc 13 blah</df>
>             <e>cc level eeeee  head 14</e>
>             <fss>ee blah 15</fss>
>          </cc>
>       </bbb>
>       <bbb>
>          <cc>
>             <b>level bbbbb head 16</b>
>          </cc>
>          <cc>
>             <c>level ccccc  head 17</c>
>             <df>cc 18  blah</df>
>             <e>cc level eeeee head 19</e>
>             <fhy>ee blah 20</fhy>
>          </cc>
>       </bbb>
>    </aaa>
> </document>
> 
> ....
> 
> 
> finally at aaa see if there is a descendant a, if so that is 
> the title for 
> this group, otherwise no title
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>         <xsl:output method="xml" version="1.0" encoding="UTF-8" 
> indent="yes"/>
>         <xsl:strip-space elements="*"/>
>         <xsl:template match="document">
>                 <document>
>                         <xsl:apply-templates/>
>                 </document>
>         </xsl:template>
>         <xsl:template match="aaa">
>                 <div-a>
>                         <xsl:choose>
>                                 <xsl:when test="descendant::a">
>                                         <title>
>                                                 <xsl:apply-templates 
> select="descendant::a"/>
>                                         </title>
>                                         <xsl:apply-templates 
> select="child::*"/>
>                                 </xsl:when>
>                                 <xsl:otherwise>
>                                         <xsl:apply-templates 
> select="child::*"/>
>                                 </xsl:otherwise>
>                         </xsl:choose>
>                 </div-a>
>         </xsl:template>
>         <xsl:template match="bbb">
>                 <xsl:choose>
>                         <xsl:when test="descendant::b">
>                                 <div-b>
>                                         <title>
>                                                 <xsl:apply-templates 
> select="descendant::b"/>
>                                         </title>
>                                         <xsl:apply-templates 
> select="child::*"/>
>                                 </div-b>
>                         </xsl:when>
>                         <xsl:otherwise>
>                                 <xsl:apply-templates 
> select="child::*"/>
>                         </xsl:otherwise>
>                 </xsl:choose>
>         </xsl:template>
>         <xsl:template match="ccc">
>                 <xsl:choose>
>                         <xsl:when test="descendant::c">
>                                 <div-c>
>                                         <title>
>                                                 <xsl:apply-templates 
> select="descendant::c"/>
>                                         </title>
>                                         <xsl:apply-templates 
> select="descendant::*[preceding-sibling::c]"/>
>                                         <xsl:apply-templates 
> select="child::*"/>
>                                 </div-c>
>                         </xsl:when>
>                         <xsl:otherwise>
>                                 <xsl:apply-templates 
> select="child::*[not(c)]|descendant::*[preceding-sibling::c]"/>
>                         </xsl:otherwise>
>                 </xsl:choose>
>         </xsl:template>
>         <xsl:template match="ddd">
>                 <xsl:choose>
>                         <xsl:when test="descendant::d">
>                                 <div-d>
>                                         <title>
>                                                 <xsl:apply-templates 
> select="descendant::d"/>
>                                         </title>
>                                         <xsl:apply-templates 
> select="descendant::*[preceding-sibling::d]"/>
>                                         <xsl:apply-templates 
> select="child::*"/>
>                                 </div-d>
>                         </xsl:when>
>                         <xsl:otherwise>
>                                 <xsl:apply-templates 
> select="child::*|descendant::*[preceding-sibling::d]"/>
>                         </xsl:otherwise>
>                 </xsl:choose>
>         </xsl:template>
>         <xsl:template match="eee">
>                 <xsl:choose>
>                         <xsl:when test="descendant::e">
>                                 <div-e>
>                                         <title>
>                                                 <xsl:apply-templates 
> select="descendant::e"/>
>                                         </title>
>                                         <xsl:apply-templates 
> select="descendant::*[preceding-sibling::e]"/>
>                                         <xsl:apply-templates 
> select="child::*"/>
>                                 </div-e>
>                         </xsl:when>
>                         <xsl:otherwise>
>                                 <xsl:apply-templates 
> select="child::*|descendant::*[preceding-sibling::e]"/>
>                         </xsl:otherwise>
>                 </xsl:choose>
>         </xsl:template>
> 
>         <xsl:template match="a|b|c|d|e|f|g|h|i">
>                 <xsl:apply-templates/>
>         </xsl:template>
>         <xsl:template match="@*|node()" >
>                 <xsl:copy>
>                         <xsl:apply-templates select="@*|node()"/>
>                 </xsl:copy>
>         </xsl:template>
> </xsl:stylesheet>
> 
> and then we can get rid of divs that have no title. Thus solving the 
> missing div problem.
> 
> 
> 
> Jim Albright
> 704 843-0582
> Wycliffe Bible Translators

Current Thread