[xsl] Coding Optimization for big files

Subject: [xsl] Coding Optimization for big files
From: "Diego, Vitiello" <VDiego@xxxxxxxxxxxxx>
Date: Wed, 10 Mar 2004 15:38:48 +0200
Hi all,
I would like to get your help about a performance problem I've experienced.
I'm sure there are some workaround to overcame that problem, for example a physical splitting of the input xml file in several chunks (I have already tried and it works fine)
But what I should need is only a logical splitting or just a better usage of variables/keys in the XSL coding.

The execution time for a small-medium files (size: 1,5MByte containing 100 contracts and 1500 Gr22) is around 100 seconds.
The execution time fot the biggest (worst case) file (size: 25MByte containing 1800 <contracts> where each contract has several <Gr22> for a total of around 30000 Gr22!!!) is 6 hours!!!
I tried to analysed the problem and of course it is related to the memory loading of the variables allSUMGr22 and allContracts, and their access by the XSLT processor.
The goal would be, for example to generate input xml files grouped by group of contracts <SUM groupId='1'> or to generate different tags for each group <SUM1>, <SUM2> etc...
I guess I would need to define variables that don't required too much memory and that are able to filter the 30000 items.
But I don't be sure that I can avoid defining big variables.

Is there any suggestions about this optimization?

Thanks in advance
Diego

TRANSFORM.XML

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
	<xsl:output version="1.0" method="xml" indent="yes"/>

	<xsl:key name="gr22CustomerContractKey" match="/TIMM-MESSAGE/SUM/Gr22" use="concat(@customer,'|',@contract)"/>

	<xsl:variable name="allSUMGr22" select="/TIMM-MESSAGE/SUM/Gr22"/>
	<xsl:variable name="allContracts" select="/TIMM-MESSAGE/SUM/Gr22[count(. | key('gr22CustomerContractKey', concat(@customer,'|',@contract))[1])=1][IMD/servicecodeid='DNNUM'][IMD/productdes='8']"/>

	<xsl:template match="/">
		<doc_result>
			<xsl:for-each select="$allContracts">
			<xsl:variable name="indexCustomer" select="@customer"/>
			<xsl:variable name="indexContract" select="@contract"/>
			<contract>
				<xsl:call-template name="getNumber">
					<xsl:with-param name="pIndexCustomer" select="$indexCustomer"/>
					<xsl:with-param name="pIndexContract" select="$indexContract"/>
				</xsl:call-template>
			</contract>
			</xsl:for-each>
		</doc_result>
	</xsl:template>

	<xsl:template name="getNumber">
		<xsl:param name="pIndexCustomer"/>
		<xsl:param name="pIndexContract"/>
		<contract_number>
				<xsl:value-of select="$allSUMGr22[@customer=$pIndexCustomer][@contract=$pIndexContract]/IMD[productdes='8'][servicecodeid='DNNUM']/fulldesc"/>
		</contract_number>
	</xsl:template>
</xsl:stylesheet>

XML structure

<TIMM-MESSAGE>
<SUM>
...other tags...
<Gr22 customer='1' contract='1'>
<IMD>
	<productdes>8</productdes>
	<servicecodeid>DNNUM</servicecodeid>
	<shortdesc></shortdesc>
	<fulldesc>number1</fulldesc>
</IMD>
...other tags...
</Gr22>
...other Gr22 related to the customer='1' contract='1'...

<Gr22 customer='1' contract='2'>
<IMD>
	<productdes>8</productdes>
	<servicecodeid>DNNUM</servicecodeid>
	<shortdesc></shortdesc>
	<fulldesc>number2</fulldesc>
</IMD>
...other tags...
</Gr22>
...other Gr22 related to the customer='1' contract='2'...

...other Gr22 related to the customer='1' for all the other contracts...

</SUM>
</TIMM-MESSAGE>

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread