RE: [xsl] building a hierarchical classification out of flat and redundant data

Subject: RE: [xsl] building a hierarchical classification out of flat and redundant data
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 24 Jul 2006 12:43:36 +0100
The key to this is to do a depth-first recursive traversal of the tree by
starting with the root, and for each node processing its children using
xsl:apply-templates, but with one key difference: you need to select the
logical root and the logical children, rather than the XML root and the XML
children.

I'm not sure how you identify the logical root in your structure. The
logical children of a node N are those nodes that have a child element equal
(in name and value) to every child element of N: that is in XSLT 2.0:

<xsl:template match="document">
  <xsl:variable name="this" select="."/>
  <xsl:apply-templates select="//document[
     every $c in $this/* satisfies (some $d in ./* satisfies deep-equals($c,
$d))]"/>


It's a bit more difficult in 1.0, but I hope you get the idea. 

Michael Kay
http://www.saxonica.com/  



> -----Original Message-----
> From: Georg Hohmann [mailto:georg.hohmann@xxxxxxxxx] 
> Sent: 24 July 2006 11:43
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] building a hierarchical classification out of 
> flat and redundant data
> 
> Dear XSLT-Community,
> 
> i have problem with some "strange" type of data which i have 
> to convert to a hierarchical xml structure.
> 
> My source is a huge xml file which represents a decimal 
> classifikation. It contains so called documents, where each 
> document represents one node of the classification. 
> Furthermore each documents shows the direct parents of a 
> node. It's a structure like this (example taken from 
> http://www.udcc.org):
> ...
> <document>
> 	<tag1>3</tag1>
> 	<tag1a>Social Sciences</tag1a>
> </document>
> <document>
> 	<tag1>3</tag1>
> 	<tag1a>Social Sciences</tag1a>
> 	<tag2>32</tag2>
> 	<tag2a>Politics</tag2a>
> </document>
> <document>
> 	<tag1>3</tag1>
> 	<tag1a>Social Sciences</tag1a>
> 	<tag2>32</tag2>
> 	<tag2a>Politics</tag2a>
> 	<tag3>326</tag3>
> 	<tag3a>Slavery</tag3a>
> </document>
> ...
> As you can see there is no hierarchical information in it 
> instead of the names and the sequence of the tags. In my real 
> data i have up to 9 levels, but not every time. My result 
> should look like this (or something similar):
> ...
> <node id="3" name="Social Science">
>    <node id="32" name="Politics">
>       <node id="326" name="Slavery"/>
>    </node>
> </node>
> ...
> I have simply no idea what to start with to archive this 
> result. I guess the first step would be to get rid of all 
> those redundant content, but i don't know how. And i even 
> can't figure out how to build the hierachichal structure the 
> same time.
> 
> Has anyone a good starting point for this?

Current Thread