RE: [xsl] Performance problems with grouping

Subject: RE: [xsl] Performance problems with grouping
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 1 Sep 2004 10:58:20 +0100
> is it possible to use the <xsl:key> function on a node-set 
> returned by the 
> ext:node-set() function ?

Assuming that the vendor's implementation of ext:node-set() is reasonably
conformant, yes. The key() function is defined to work on the tree
containing the context node, which can be any tree, including one created
using ext:node-set().

Michael Kay



> 
> What I am performing is the following:
> - I combine two xml files to one (by means of the document() function)
> - this temporary xml file is transformed into a node-set by 
> means of the 
> node-set() extension function
> 
> Now on this temporary file I want to perform grouping. I am 
> using the axis 
> preceding-sibling (a number of times) which produces the 
> correct output. But 
> this is causing big performance problems. For an xml source 
> of about 1MB it 
> already takes a very long time (more than half a hour !). I 
> know that the 
> xsl:key is a faster way, but I don't think I can use that for 
> a temp tree, 
> can I ?
> 
> Temp tree:
> <extended_root>
> <object name="object1" timestamp="15:00:00" instance="0" 
> package="pack1" 
> value="1"/>
> <object name="object2" timestamp="15:00:00" instance="0" 
> package="pack2" 
> value="1"/>
> <object name="object3" timestamp="15:00:00" instance="0" 
> package="pack1" 
> value="3"/>
> <object name="object1" timestamp="15:00:00" instance="1" 
> package="pack1" 
> value="4"/>
> <object name="object1" timestamp="15:30:00" instance="0" 
> package="pack1" 
> value="1"/>
> <object name="object2" timestamp="15:30:00" instance="0" 
> package="pack2" 
> value="1"/>
> <object name="object3" timestamp="15:30:00" instance="0" 
> package="pack1" 
> value="4"/>
> <object name="object1" timestamp="15:30:00" instance="1" 
> package="pack1" 
> value="3"/>
> </extended_root>
> 
> Pro package I need one header with an enumeration of all 
> objects for this 
> package:
> HEADER;pack1;object1;object3;object4
> 
> Under each header a grouping is needed pro timestamp and pro 
> instance. All 
> values for that belongs to this grouping are added to the entry
> 
> HEADER;pack1;object1;object3
> pack1;;SCANNER;15:00:00;Instance0;1,3
> pack1;;SCANNER;15:00:00;Instance1;4,-1
> pack1;;SCANNER;15:30:00;Instance0;1,4
> pack1;;SCANNER;15:30:00;Instance1;3,-1
> 
> HEADER;pack2;object2
> pack2;;SCANNER;15:00:00;Instance0;1
> pack2;;SCANNER;15:30:00;Instance1;1
> 
> To achieve this I have following template:
> 
> <xsl:template match="extended_root">
> 	<xsl:for-each select="object">		
> 		<xsl:variable name="currentPackage" select="@package"/>
> 		<xsl:variable name="currentInstance" 
> select="@instance"/>
> 		<xsl:variable name="currentDateTime" 
> select="@timestamp"/>
> 		
> 		<xsl:if 
> test="not(preceding-sibling::object[@package=$currentPackage])">
> 			<xsl:call-template name="write_header">
> 				<xsl:with-param name="counterList" 
> select="/extended_root/object[(@package=$currentPackage) and 
> (not(@name=preceding-sibling::object/@name))]/@name"/>
> 			</xsl:call-template>
> 		</xsl:if>
> 		
> 		<xsl:if 
> test="(not(preceding-sibling::object[(@package=$currentPackage) 
> and (@instance=$currentInstance) and 
> (@timestamp=$currentDateTime)]))">
> 			<xsl:call-template name="write_datarecord">
> 				<xsl:with-param name="counterList" 
> select="/extended_root/object[(@package=$currentPackage) and 
> (not(@name=preceding-sibling::object/@name))]/@name"/>
> 			</xsl:call-template>
> 		</xsl:if>
> 	</xsl:for-each>
> </xsl:template>
> 
> Writing the header is performed in an acceptable amount of 
> time, but writing 
> the records gives a lot of problems.
> 
> Does anybody has some suggestions how I could improve performance.
> 
> Thanks !
> 
> Kind regards,
> ismael

Current Thread