Re: [xsl] building a hierarchical classification out of flat and redundant data

Subject: Re: [xsl] building a hierarchical classification out of flat and redundant data
From: "Albert Juhé" <albertjuhe@xxxxxxxxx>
Date: Tue, 25 Jul 2006 10:01:12 +0200
Hi David,

The last week an amazin brown arrive me, the problem is the same:
I have this xml:

<modul>
<unit id="1">
<subunit>Rupturas</subunit>
<sub-subunit>sistema </sub-subunit>
<sub-subunit>incertidumbre</sub-subunit>
<subunit>Megatendencias</subunit>
<sub-subunit>Caracterizacisn</sub-subunit>
<sub-sub-subunit>1.2.1.1.</sub-sub-subunit>
<p>Text 1211</p>
<param>Text 2 1211</param>
<sub-sub-subunit>1.2.1.2.</sub-sub-subunit>
<sub-sub-subunit>1.2.1.3.</sub-sub-subunit>
<sub-subunit>Vectores</sub-subunit>
<sub-sub-subunit>1.2.2.1.</sub-sub-subunit>
<sub-sub-subunit>1.2.2.2.</sub-sub-subunit>
<sub-sub-subunit>1.2.2.3.</sub-sub-subunit>
<subunit>Perspectivas</subunit>
<sub-subunit>Ideologmas</sub-subunit>
<sub-sub-subunit>1.3.1.1.</sub-sub-subunit>
<sub-sub-subunit>1.3.1.2.</sub-sub-subunit>
<sub-subunit>controversia</sub-subunit>
<sub-sub-subunit>1.3.2.1.</sub-sub-subunit>
<sub-sub-subunit>1.3.2.2.</sub-sub-subunit>
</unit>
<unit id="2">
<p>Desafmos sociolaboral</p>
<subunit>Cantidad</subunit>
<p>Text Cantidad</p>
<sub-subunit>riqueza</sub-subunit>
<sub-subunit>paramso</sub-subunit>
<sub-subunit>materia</sub-subunit>
<sub-subunit>panorama a las perspectivas</sub-subunit>
<subunit>Calidad</subunit>
<sub-subunit>Polarizacisn</sub-subunit>
<sub-subunit>La cara</sub-subunit>
<sub-subunit>La cruz</sub-subunit>
<sub-subunit>Precarizacisn</sub-subunit>
<subunit>experiencia</subunit>
<sub-subunit>Ejes</sub-subunit>
<sub-subunit>Condiciones</sub-subunit>
<sub-sub-subunit>2.3.2.1.</sub-sub-subunit>
<sub-sub-subunit>2.3.2.2.</sub-sub-subunit>
<sub-sub-subunit>2.3.2.3.</sub-sub-subunit>
<subunit>paradigma</subunit>
<sub-subunit>civilizacisn</sub-subunit>
<sub-subunit>emplemsmo</sub-subunit>
<sub-subunit>Agenda</sub-subunit>
</unit>
</modul>

And I have to convert in a hierarchial xml structure into the unit
tag, with this conditions:
- Between tag can exists another tags, this tags belongs to the
preceding-sibling.
- The hierarchi is: unit, subunit,sub-subunit and sub-sub-subunit.

Result file and solution:

<modul>
	<unit id="1">
		<subunit>
			<title>Rupturas</title>
			<sub-subunit>
				<title>sistema </title>
			</sub-subunit>
			<sub-subunit>
				<title>incertidumbre</title>
			</sub-subunit>
		</subunit>
		<subunit>
			<title>Megatendencias</title>
			<sub-subunit>
				<title>Caracterizacisn</title>
				<sub-sub-subunit>
					<title>1.2.1.1.</title>
					<p>Text 1211</p>
					<param>Text 2 1211</param>
				</sub-sub-subunit>
				<sub-sub-subunit>
					<title>1.2.1.2.</title>
				</sub-sub-subunit>
				<sub-sub-subunit>
					<title>1.2.1.3.</title>
				</sub-sub-subunit>
			</sub-subunit>
			<sub-subunit>
				<title>Vectores</title>
				<sub-sub-subunit>
					<title>1.2.2.1.</title>
				</sub-sub-subunit>
				<sub-sub-subunit>
					<title>1.2.2.2.</title>
				</sub-sub-subunit>
				<sub-sub-subunit>
					<title>1.2.2.3.</title>
				</sub-sub-subunit>
			</sub-subunit>
		</subunit>
		<subunit>
			<title>Perspectivas</title>
			<sub-subunit>
				<title>Ideologmas</title>
				<sub-sub-subunit>
					<title>1.3.1.1.</title>
				</sub-sub-subunit>
				<sub-sub-subunit>
					<title>1.3.1.2.</title>
				</sub-sub-subunit>
			</sub-subunit>
			<sub-subunit>
				<title>controversia</title>
				<sub-sub-subunit>
					<title>1.3.2.1.</title>
				</sub-sub-subunit>
				<sub-sub-subunit>
					<title>1.3.2.2.</title>
				</sub-sub-subunit>
			</sub-subunit>
		</subunit>
	</unit>
	<unit id="2">
		<p>Desafmos sociolaboral</p>
		<subunit>
			<title>Cantidad</title>
			<p>Text Cantidad</p>
			<sub-subunit>
			<title>riqueza</title>
			</sub-subunit>
			<sub-subunit>
			<title>paramso</title>
			</sub-subunit>
			<sub-subunit>
			<title>materia</title>
		</sub-subunit>
		<sub-subunit>
			<title>panorama a las perspectivas</title>
			</sub-subunit>
		</subunit>
		<subunit>
			<title>Calidad</title>
			<sub-subunit>
				<title>Polarizacisn</title>
			</sub-subunit>
			<sub-subunit>
				<title>La cara</title>
			</sub-subunit>
			<sub-subunit>
				<title>La cruz</title>
			</sub-subunit>
			<sub-subunit>
				<title>Precarizacisn</title>
			</sub-subunit>
		</subunit>
		<subunit>
			<title>experiencia</title>
			<sub-subunit>
				<title>Ejes</title>
			</sub-subunit>
			<sub-subunit>
				<title>Condiciones</title>
				<sub-sub-subunit>
					<title>2.3.2.1.</title>
				</sub-sub-subunit>
				<sub-sub-subunit>
					<title>2.3.2.2.</title>
				</sub-sub-subunit>
				<sub-sub-subunit>
					<title>2.3.2.3.</title>
				</sub-sub-subunit>
			</sub-subunit>
		</subunit>
		<subunit>
			<title>paradigma</title>
			<sub-subunit>
				<title>civilizacisn</title>
			</sub-subunit>
			<sub-subunit>
				<title>emplemsmo</title>
			</sub-subunit>
			<sub-subunit>
				<title>Agenda</title>
			</sub-subunit>
		</subunit>
	</unit>
</modul>

This is my solution:

	<xsl:template match="modul">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates/>
		</xsl:copy>
	</xsl:template>

	<xsl:template match="unit">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:call-template name="process-node">
				<xsl:with-param name="node-father" select="name()"/>
			</xsl:call-template>
		</xsl:copy>
	</xsl:template>

	<!-- Copy elements -->
	<xsl:template match="*">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates/>
		</xsl:copy>
	</xsl:template>

	<!--
		Test if an element match with the final block using generate-id -->
	<xsl:template name="get-block">
		<xsl:param name="context" select="."/>
		<xsl:param name="target"/>

		<xsl:if test="generate-id($context)!=$target">
			<xsl:apply-templates select="$context" mode="copia"/>
			<xsl:variable name="next-element"
select="$context/following-sibling::*[1]"/>
			<xsl:if test="$next-element">
				<xsl:call-template name="get-block">
					<xsl:with-param name="context" select="$next-element"/>
					<xsl:with-param name="target" select="$target"/>
				</xsl:call-template>
			</xsl:if>
		</xsl:if>

</xsl:template>

	<!--
		Find a subunit tag
	-->
	<xsl:template name="process-node">
		<xsl:param name="context" select="*[1]"/>
		<xsl:param name="node-father"/>

		<xsl:choose>
			<xsl:when test="$context[self::unit or self::subunit or
self::sub-subunit or self::sub-sub-subunit]">
				<xsl:variable name="node-type" select="name($context)"/>
				<xsl:element name="{$node-type}">
					<title><xsl:value-of select="$context"/></title>
					<xsl:call-template name="generate-block">
						<xsl:with-param name="context"
select="$context/following-sibling::*[1]"/>
						<xsl:with-param name="node-type" select="$node-type"/>
					</xsl:call-template>
				</xsl:element>

				<xsl:variable name="seguent-node"
select="$context/following-sibling::*[name()=$node-type][1]"/>

				<xsl:variable name="fathers-name">
					<xsl:call-template name="get-pare">
						<xsl:with-param name="unitat" select="$node-type"/>
					</xsl:call-template>
				</xsl:variable>

				<!-- Test if are the same type and have the same father, for
continuing processing -->
				<xsl:if test="$seguent-node and name($seguent-node)=$node-type and
(generate-id($seguent-node/preceding-sibling::*[name()=$fathers-name][1])=gen
erate-id($context/preceding-sibling::*[name()=$fathers-name][1]))">
					<xsl:call-template name="process-node">
						<xsl:with-param name="context" select="$seguent-node"/>
					</xsl:call-template>
				</xsl:if>

			</xsl:when>
			<xsl:otherwise>
				<xsl:apply-templates select="$context"/>
				<xsl:if test="$context/following-sibling::*">
					<xsl:call-template name="process-node">
						<xsl:with-param name="context"
select="$context/following-sibling::*[1]"/>
					</xsl:call-template>
				</xsl:if>
			</xsl:otherwise>
		</xsl:choose>
	</xsl:template>

	<xsl:template name="generate-block">
		<xsl:param name="context"/>
		<xsl:param name="node-type"/>

		<xsl:if test="$context">
			<!-- Where stops to process? -->
			<xsl:variable name="pares">
				<xsl:call-template name="get-ordre-unitat">
					<xsl:with-param name="unitat" select="$node-type"/>
				</xsl:call-template>
			</xsl:variable>
			<xsl:variable name="node-limit"
select="contains($pares,concat('*',name($context),'*'))"/>

			<xsl:if test="not($node-limit)">
				<xsl:choose>
					<xsl:when test="$context[self::unit or self::subunit or
self::sub-subunit or self::sub-sub-subunit]">
						<xsl:call-template name="process-node">
							<xsl:with-param name="context" select="$context"/>
						</xsl:call-template>
					</xsl:when>
					<xsl:otherwise>
						<xsl:apply-templates select="$context"/>
						<xsl:call-template name="generate-block">
							<xsl:with-param name="context"
select="$context/following-sibling::*[1]"/>
							<xsl:with-param name="node-type" select="$node-type"/>
						</xsl:call-template>
					</xsl:otherwise>
				</xsl:choose>
			</xsl:if>
		</xsl:if>

</xsl:template>

	<!-- Sets the hierarchial order -->
	<xsl:template name="get-ordre-unitat">
		<xsl:param name="unitat"/>

		<xsl:choose>
			<xsl:when test="$unitat='unit'">
				<xsl:value-of select="'*unit*'"/>
			</xsl:when>
			<xsl:when test="$unitat='subunit'">
				<xsl:value-of select="'*unit*subunit*'"/>
			</xsl:when>
			<xsl:when test="$unitat='sub-subunit'">
				<xsl:value-of select="'*unit*subunit*sub-subunit*'"/>
			</xsl:when>
			<xsl:when test="$unitat='sub-sub-subunit'">
				<xsl:value-of select="'*unit*subunit*sub-subunit*sub-sub-subunit*'"/>
			</xsl:when>
		</xsl:choose>

</xsl:template>

	<!-- Retorna pare -->
	<xsl:template name="get-pare">
		<xsl:param name="unitat"/>

		<xsl:choose>
			<xsl:when test="$unitat='unit'">
				<xsl:value-of select="''"/>
			</xsl:when>
			<xsl:when test="$unitat='subunit'">
				<xsl:value-of select="'unit'"/>
			</xsl:when>
			<xsl:when test="$unitat='sub-subunit'">
				<xsl:value-of select="'subunit'"/>
			</xsl:when>
			<xsl:when test="$unitat='sub-sub-subunit'">
				<xsl:value-of select="'sub-subunit'"/>
			</xsl:when>
		</xsl:choose>

</xsl:template>


2006/7/24, Georg Hohmann <georg.hohmann@xxxxxxxxx>:
Dear XSLT-Community,

i have problem with some "strange" type of data which i have to
convert to a hierarchical xml structure.

My source is a huge xml file which represents a decimal
classifikation. It contains so called documents, where each document
represents one node of the classification. Furthermore each documents
shows the direct parents of a node. It's a structure like this
(example taken from http://www.udcc.org):
...
<document>
       <tag1>3</tag1>
       <tag1a>Social Sciences</tag1a>
</document>
<document>
       <tag1>3</tag1>
       <tag1a>Social Sciences</tag1a>
       <tag2>32</tag2>
       <tag2a>Politics</tag2a>
</document>
<document>
       <tag1>3</tag1>
       <tag1a>Social Sciences</tag1a>
       <tag2>32</tag2>
       <tag2a>Politics</tag2a>
       <tag3>326</tag3>
       <tag3a>Slavery</tag3a>
</document>
...
As you can see there is no hierarchical information in it instead of
the names and the sequence of the tags. In my real data i have up to 9
levels, but not every time. My result should look like this (or
something similar):
...
<node id="3" name="Social Science">
  <node id="32" name="Politics">
     <node id="326" name="Slavery"/>
  </node>
</node>
...
I have simply no idea what to start with to archive this result. I
guess the first step would be to get rid of all those redundant
content, but i don't know how. And i even can't figure out how to
build the hierachichal structure the same time.

Has anyone a good starting point for this?

Current Thread