Subject: RE: [xsl] Re: up-converting From: Jim_Albright@xxxxxxxxxxxx Date: Tue, 28 Sep 2004 08:07:15 -0400 |
I have a solution for the up-converting problem that I had. It isn't as elegant as I was hoping for. Maybe someone here can give me a few more pointers. Thanks again for including the for each group structure as that makes the solution much easier. My general problem is conversion of a flat XML (WordML) document to one with hierarchy. After tossing out all of the formatting info, The first step is to map all the paragraphs that indicate divs to their appropriate level. I use a, b, c, d, ... for new element names in order to make this more generic. The a, b, c, indicate the head or title for the div. The div may be nested: a, b, c, d. Some divs may be omitted: a, b, d. Divs may be followed by other divs or paragraphs. Paragraphs may contain spans. Next use the for-each-group structure to put a aa element around the a elements. Next use the for-each-group structure to put a bb element around the b elements and aaa instead of aa. ... Each of these steps builds the required hierarchy one step at a time. Since some divs may be omitted I couldn't find a way to combine these steps. Next the head/title is pulled out. Toss out any div with no head/title sample input <?xml version="1.0" encoding="UTF-8"?> <document> <a>level aaaa head 1</a> <b>level bbbb head 2</b> <c>level ccccc head 3</c> <dfg>cc 4 blah</dfg> <e>level eeee head 5 </e> <fhh>cc blah 6</fhh> <c>level ccccc head 7</c> <df>cc 8 blah<kkk>kkk within df within c</kkk> </df> <d>level dddd head 9</d> <iuo>dd 10 blah</iuo> <jtt>dd blah 11</jtt> <c>level ccccc head 12</c> <df>cc 13 blah</df> <e>cc level eeeee head 14</e> <fss>ee blah 15</fss> <b>level bbbbb head 16</b> <c>level ccccc head 17</c> <df>cc 18 blah</df> <e>cc level eeeee head 19</e> <fhy>ee blah 20</fhy> </document> and the required output is <?xml version="1.0" encoding="UTF-8"?> <document> <div-a> <title>level aaaa head 1</title> <div-b> <title>level bbbb head 2</title> <div-c> <title>level ccccc head 3</title> <dfg>cc 4 blah</dfg> <div-e> <title>level eeee head 5 </title> <fhh>cc blah 6</fhh> </div-e> </div-c> <div-c> <title>level ccccc head 7</title> <df>cc 8 blah<kkk>kkk within df within c</kkk> </df> <div-d> <title>level dddd head 9</title> <iuo>dd 10 blah</iuo> <jtt>dd blah 11</jtt> </div-d> </div-c> <div-c> <title>level ccccc head 12</title> <df>cc 13 blah</df> <div-e> <title>cc level eeeee head 14</title> <fss>ee blah 15</fss> </div-e> </div-c> </div-b> <div-b> <title>level bbbbb head 16</title> <div-c> <title>level ccccc head 17</title> <df>cc 18 blah</df> <div-e> <title>cc level eeeee head 19</title> <fhy>ee blah 20</fhy> </div-e> </div-c> </div-b> </div-a> </document> Next use the for-each-group structure to put a aa element around the a elements. <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="document"> <document> <xsl:for-each-group select="*" group-starting-with="a"> <aa> <xsl:for-each select="current-group()"> <xsl:copy-of select="."/> </xsl:for-each> </aa> </xsl:for-each-group> </document> </xsl:template> <xsl:template match="@*|node()" name="copy-current-node"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> with output of <?xml version="1.0" encoding="UTF-8"?> <document> <aa> <a>level aaaa head 1</a> <b>level bbbb head 2</b> <c>level ccccc head 3</c> <dfg>cc 4 blah</dfg> <e>level eeee head 5 </e> <fhh>cc blah 6</fhh> <c>level ccccc head 7</c> <df>cc 8 blah<kkk>kkk within df within c</kkk> </df> <d>level dddd head 9</d> <iuo>dd 10 blah</iuo> <jtt>dd blah 11</jtt> <c>level ccccc head 12</c> <df>cc 13 blah</df> <e>cc level eeeee head 14</e> <fss>ee blah 15</fss> <b>level bbbbb head 16</b> <c>level ccccc head 17</c> <df>cc 18 blah</df> <e>cc level eeeee head 19</e> <fhy>ee blah 20</fhy> </aa> </document> Next use the for-each-group structure to put a bb element around the b elements and aaa instead of aa. <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="document"> <document> <xsl:apply-templates/> </document> </xsl:template> <xsl:template match="aa"> <aaa> <xsl:for-each-group select="*" group-starting-with="b"> <bb> <xsl:for-each select="current-group()"> <xsl:copy-of select="."/> </xsl:for-each> </bb> </xsl:for-each-group> </aaa> </xsl:template> <xsl:template match="@*|node()" name="copy-current-node"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> <?xml version="1.0" encoding="UTF-8"?> <document> <aaa> <bb> <a>level aaaa head 1</a> </bb> <bb> <b>level bbbb head 2</b> <c>level ccccc head 3</c> <dfg>cc 4 blah</dfg> <e>level eeee head 5 </e> <fhh>cc blah 6</fhh> <c>level ccccc head 7</c> <df>cc 8 blah<kkk>kkk within df within c</kkk> </df> <d>level dddd head 9</d> <iuo>dd 10 blah</iuo> <jtt>dd blah 11</jtt> <c>level ccccc head 12</c> <df>cc 13 blah</df> <e>cc level eeeee head 14</e> <fss>ee blah 15</fss> </bb> <bb> <b>level bbbbb head 16</b> <c>level ccccc head 17</c> <df>cc 18 blah</df> <e>cc level eeeee head 19</e> <fhy>ee blah 20</fhy> </bb> </aaa> </document> continue adding the levels <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="document"> <document> <xsl:apply-templates/> </document> </xsl:template> <xsl:template match="aaa"> <aaa> <xsl:apply-templates/> </aaa> </xsl:template> <xsl:template match="bb"> <bbb> <xsl:for-each-group select="*" group-starting-with="c"> <cc> <xsl:for-each select="current-group()"> <xsl:copy-of select="."/> </xsl:for-each> </cc> </xsl:for-each-group> </bbb> </xsl:template> <xsl:template match="@*|node()" name="copy-current-node"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> <?xml version="1.0" encoding="UTF-8"?> <document> <aaa> <bbb> <cc> <a>level aaaa head 1</a> </cc> </bbb> <bbb> <cc> <b>level bbbb head 2</b> </cc> <cc> <c>level ccccc head 3</c> <dfg>cc 4 blah</dfg> <e>level eeee head 5 </e> <fhh>cc blah 6</fhh> </cc> <cc> <c>level ccccc head 7</c> <df>cc 8 blah<kkk>kkk within df within c</kkk> </df> <d>level dddd head 9</d> <iuo>dd 10 blah</iuo> <jtt>dd blah 11</jtt> </cc> <cc> <c>level ccccc head 12</c> <df>cc 13 blah</df> <e>cc level eeeee head 14</e> <fss>ee blah 15</fss> </cc> </bbb> <bbb> <cc> <b>level bbbbb head 16</b> </cc> <cc> <c>level ccccc head 17</c> <df>cc 18 blah</df> <e>cc level eeeee head 19</e> <fhy>ee blah 20</fhy> </cc> </bbb> </aaa> </document> .... finally at aaa see if there is a descendant a, if so that is the title for this group, otherwise no title <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="document"> <document> <xsl:apply-templates/> </document> </xsl:template> <xsl:template match="aaa"> <div-a> <xsl:choose> <xsl:when test="descendant::a"> <title> <xsl:apply-templates select="descendant::a"/> </title> <xsl:apply-templates select="child::*"/> </xsl:when> <xsl:otherwise> <xsl:apply-templates select="child::*"/> </xsl:otherwise> </xsl:choose> </div-a> </xsl:template> <xsl:template match="bbb"> <xsl:choose> <xsl:when test="descendant::b"> <div-b> <title> <xsl:apply-templates select="descendant::b"/> </title> <xsl:apply-templates select="child::*"/> </div-b> </xsl:when> <xsl:otherwise> <xsl:apply-templates select="child::*"/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="ccc"> <xsl:choose> <xsl:when test="descendant::c"> <div-c> <title> <xsl:apply-templates select="descendant::c"/> </title> <xsl:apply-templates select="descendant::*[preceding-sibling::c]"/> <xsl:apply-templates select="child::*"/> </div-c> </xsl:when> <xsl:otherwise> <xsl:apply-templates select="child::*[not(c)]|descendant::*[preceding-sibling::c]"/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="ddd"> <xsl:choose> <xsl:when test="descendant::d"> <div-d> <title> <xsl:apply-templates select="descendant::d"/> </title> <xsl:apply-templates select="descendant::*[preceding-sibling::d]"/> <xsl:apply-templates select="child::*"/> </div-d> </xsl:when> <xsl:otherwise> <xsl:apply-templates select="child::*|descendant::*[preceding-sibling::d]"/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="eee"> <xsl:choose> <xsl:when test="descendant::e"> <div-e> <title> <xsl:apply-templates select="descendant::e"/> </title> <xsl:apply-templates select="descendant::*[preceding-sibling::e]"/> <xsl:apply-templates select="child::*"/> </div-e> </xsl:when> <xsl:otherwise> <xsl:apply-templates select="child::*|descendant::*[preceding-sibling::e]"/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="a|b|c|d|e|f|g|h|i"> <xsl:apply-templates/> </xsl:template> <xsl:template match="@*|node()" > <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> and then we can get rid of divs that have no title. Thus solving the missing div problem. Jim Albright 704 843-0582 Wycliffe Bible Translators
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] including a file in xsl, Jarno.Elovirta | Thread | RE: [xsl] Re: up-converting, Michael Kay |
RE: [xsl] including a file in xsl, Jarno.Elovirta | Date | RE: [xsl] Nested for-each-group, Michael Kay |
Month |