[xsl] structure a flat file

Subject: [xsl] structure a flat file
From: "Jakob Fix" <jakob.fix@xxxxxxxxx>
Date: Wed, 25 Jun 2008 20:16:19 +0200
Hi all,

I know there is a quite some information about structuring a flat file
(I basically sleep with Michael Kay's XSLT 2.0 and XPath 2.0 under my
pillow, and dpawson.co.uk XSLT FAQ is my browser's opening page :)),
but I couldn't find anything that relates to this specific case, or at
least I couldn't make the connection.


Here is the input (see also at the end of this message for a more
complex and more confusing example), @Level indicates the structural
depth of the node:

<Section>
       <Coordinates Level="1" Order="1" Label_E="1995">YEA=1995</Coordinates>
       <Coordinates Level="2" Order="1"
Label_E="Public">SUB=EduExpnd_T1a</Coordinates>
       <Coordinates Level="1" Order="2" Label_E="1995">YEA=1995</Coordinates>
       <Coordinates Level="2" Order="2"
Label_E="Private">SUB=EduExpnd_T1b</Coordinates>
       <Coordinates Level="1" Order="3" Label_E="1995">YEA=1995</Coordinates>
       <Coordinates Level="2" Order="3"
Label_E="Total">SUB=EduExpnd_T1c</Coordinates>

       <Coordinates Level="1" Order="4" Label_E="2004">YEA=2004</Coordinates>
       <Coordinates Level="2" Order="4"
Label_E="Public">SUB=EduExpnd_T1a</Coordinates>
       <Coordinates Level="1" Order="5" Label_E="2004">YEA=2004</Coordinates>
       <Coordinates Level="2" Order="5"
Label_E="Private">SUB=EduExpnd_T1b</Coordinates>
       <Coordinates Level="1" Order="6" Label_E="2004">YEA=2004</Coordinates>
       <Coordinates Level="2" Order="6"
Label_E="Total">SUB=EduExpnd_T1c</Coordinates>
</Section>


That's the required output (let's disregard the mixed contents, all
CDATA will be wrapped into an element):

<Section>
       <Coordinates Level="1" Order="1" Label_E="1995">YEA=1995
               <Coordinates Level="2" Order="1"
Label_E="Public">SUB=EduExpnd_T1a</Coordinates>
               <Coordinates Level="2" Order="2"
Label_E="Private">SUB=EduExpnd_T1b</Coordinates>
               <Coordinates Level="2" Order="3"
Label_E="Total">SUB=EduExpnd_T1c</Coordinates>
       </Coordinates>

       <Coordinates Level="1" Order="4" Label_E="2004">YEA=2004
               <Coordinates Level="2" Order="4"
Label_E="Public">SUB=EduExpnd_T1a</Coordinates>
               <Coordinates Level="2" Order="5"
Label_E="Private">SUB=EduExpnd_T1b</Coordinates>
               <Coordinates Level="2" Order="6"
Label_E="Total">SUB=EduExpnd_T1c</Coordinates>
       </Coordinates>
</Section>




I am trying to use <xsl:for-each-group> but don't really get to grips with it.

My idea is to first find the nodes that represent the start of a
group, like this:

       for $x in distinct-values(//Section[1]/Coordinates[@Level='1'])
       return //Section[1]/Coordinates[@Level='1'][contains(.,$x)][1]

However, I do not know which of the four different groupings to use
then and how.  Any help is very much appreciated.


Here is another extract from an input file that is a bit richer, but
shows the same characteristics:

<Section>
               <Coordinates Level="1" Order="1" Label_E="Imports:
F.o.b.">FLOW=IMP</Coordinates>
               <Coordinates Level="2" Order="1" Label_E="2006"/>
               <Coordinates Level="3" Order="1"
Label_E="2006">TIME=2006</Coordinates>
               <Coordinates Level="1" Order="2" Label_E="Imports:
F.o.b.">FLOW=IMP</Coordinates>
               <Coordinates Level="2" Order="2" Label_E="2007"/>
               <Coordinates Level="3" Order="2"
Label_E="2007">TIME=2007</Coordinates>
               <Coordinates Level="1" Order="3" Label_E="Imports:
F.o.b.">FLOW=IMP</Coordinates>
               <Coordinates Level="2" Order="3" Label_E="2007$$$"/>
               <Coordinates Level="3" Order="3"
Label_E="Q2">TIME=2007Q2</Coordinates>
               <Coordinates Level="1" Order="4" Label_E="Imports:
F.o.b.">FLOW=IMP</Coordinates>
               <Coordinates Level="2" Order="4" Label_E="2007$$$"/>
               <Coordinates Level="3" Order="4"
Label_E="Q3">TIME=2007Q3</Coordinates>
               <Coordinates Level="1" Order="5" Label_E="Imports:
F.o.b.">FLOW=IMP</Coordinates>
               <Coordinates Level="2" Order="5" Label_E="2007$$$"/>
               <Coordinates Level="3" Order="5"
Label_E="Q4">TIME=2007Q4</Coordinates>
               <Coordinates Level="1" Order="6" Label_E="Imports:
F.o.b.">FLOW=IMP</Coordinates>
               <Coordinates Level="2" Order="6" Label_E="2008$$$"/>
               <Coordinates Level="3" Order="6"
Label_E="Q1">TIME=2008Q1</Coordinates>
               <Coordinates Level="1" Order="7" Label_E="Imports:
F.o.b.">FLOW=IMP</Coordinates>
               <Coordinates Level="2" Order="7" Label_E="2007"/>
               <Coordinates Level="3" Order="7"
Label_E="Sep">TIME=2007M9</Coordinates>
               [...]
</Section>


Is to become:

<Section>
               <Coordinates Level="1" Order="1" Label_E="Imports:
F.o.b.">FLOW=IMP
                       <Coordinates Level="2" Order="1" Label_E="2006"/>
                               <Coordinates Level="3" Order="1"
Label_E="2006">TIME=2006</Coordinates>

                       <Coordinates Level="2" Order="2" Label_E="2007">
                               <Coordinates Level="3" Order="2"
Label_E="2007">TIME=2007</Coordinates>
                       </Coordinates>

                       <Coordinates Level="2" Order="3" Label_E="2007$$$">
                               <Coordinates Level="3" Order="3"
Label_E="Q2">TIME=2007Q2</Coordinates>
                               <Coordinates Level="3" Order="4"
Label_E="Q3">TIME=2007Q3</Coordinates>
                               <Coordinates Level="3" Order="5"
Label_E="Q4">TIME=2007Q4</Coordinates>
                       </Coordinates>

                       <Coordinates Level="2" Order="6" Label_E="2008$$$">
                               <Coordinates Level="3" Order="6"
Label_E="Q1">TIME=2008Q1</Coordinates>
                       </Coordinates>

                       <Coordinates Level="2" Order="7" Label_E="2007">
                               <Coordinates Level="3" Order="7"
Label_E="Sep">TIME=2007M9</Coordinates>
                               [...]
                       </Coordinates>
               [...]
               </Coordinates>
</Section>


--
cheers,
Jakob.

Current Thread