Subject: Grouping problem with large files in .Net From: "Frederik Willaert" <f.w@xxxxxxxxxxx> Date: Mon, 7 Jun 2004 01:46:35 +0200 (Romance Daylight Time) |
Hi, I have a problem with grouping large record-style XML documents using the Net XslTransform class. My source document has the following structure: <REPORT> <ROW> <CUSTOMER>XXX</CUSTOMER> <ACCOUNT>YYY</ACCOUNT> <HOURNUMBER>1</HOURNUMBER> <VALUE1>...</VALUE1> <VALUE2>...</VALUE2> <VALUE3>...</VALUE3> <!-- ... --> </ROW> <ROW> <!-- ... --> </ROW> <!-- ... --> </REPORT> The stylesheet I'm executing is the following: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3 org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:key name="rows-by-customer" match="/REPORT/ROW" use="CUSTOMER"/> <xsl:key name="rows-by-customer-and-account" match="/REPORT/ROW" use= concat(CUSTOMER,'+',ACCOUNT)"/> <xsl:template match="/REPORT"> <Report> <xsl:for-each select="ROW[generate-id() = generate-id(key( rows-by-customer', CUSTOMER)[1])]"> <xsl:variable name="customer" select="CUSTOMER" /> <Customer Name="{$customer}"> <xsl:for-each select="key('rows-by-customer' $customer)[generate-id() = generate-id(key('rows-by-customer-and-account', concat(CUSTOMER,'+' ACCOUNT))[1])]"> <xsl:variable name="account" select="ACCOUNT" /> <Account Name="{$account}"> <xsl:for-each select="key( rows-by-customer-and-account', concat(CUSTOMER,'+',$account))"> <xsl:copy-of select="." /> </xsl:for-each> </Account> </xsl:for-each> </Customer> </xsl:for-each> </Report> </xsl:template> </xsl:stylesheet> This performs a two-level grouping: by Customer, then by Account. The source document can contain several tens of thousands of rows. => When performing this transformation using MSXML, performance is very acceptible.< 1 minute for a file with 60000 records. => However, the same transformation using .Net (1.1) XslTranform seems to take forever - haven't been able to have it processed completely so far... Unfortunately, .Net is the intended platform. ==> Am I doing something wrong, is this a known problem, and/or can something be done about this? Remarks: - I have also tried with the count(. | key('rows-by-customer', CUSTOMER)[1]) = 1 approach, same problem. - I've found a document on MSDN mentioning that the xsl:key implementation had a performance problem. However, this seems to apply to .Net v1.0 (?) - Following recommendations, I'm using XPathDocument for the input file, and a stream for the output - or would there be better options? - I've included the source code for the transformation, and the timings of several transformations (using MSXSL and XslTransform) below. Any help would be greatly appreciated... Thanks in advance, Frederik ***************** C# code to do transformation: string folder = @"D:\Test\grouping\"; string inputUri = folder + "FlatInput.xml"; string stylesheet1uri = folder + "FlatInput2Grouped.xslt"; string outputUri = folder + "groupedOutput_XslTransform.xml"; DateTime beforeStart = DateTime.Now; DateTime afterLoadingInput, afterLoadingStylesheet, afterTransform; using(FileStream output = new FileStream(outputUri,FileMode.Create FileAccess.Write,FileShare.Read)) { XPathDocument inputDocument = new XPathDocument(inputUri); afterLoadingInput = DateTime.Now; XslTransform transform = new XslTransform(); transform.Load( new XPathDocument(stylesheet1uri), null, this.GetType().Assembly.Evidence); afterLoadingStylesheet = DateTime.Now; transform.Transform(inputDocument,null,output,null); afterTransform = DateTime.Now; } ****************** Timings: MSXSL: groupedOutput_verysmall_msxsl.xml (approx. 48 records) --------------------------------- Source document load time: 27.68 milliseconds Stylesheet document load time: 1.810 milliseconds Stylesheet compile time: 1.266 milliseconds Stylesheet execution time: 6.178 milliseconds groupedOutput_small_msxsl.xml (144 records) ----------------------------- Source document load time: 45.77 milliseconds Stylesheet document load time: 2.145 milliseconds Stylesheet compile time: 1.297 milliseconds Stylesheet execution time: 48.66 milliseconds groupedOutput_medium_msxsl.xml (approx. 10000 records) ------------------------------ Source document load time: 1507 milliseconds Stylesheet document load time: 11.85 milliseconds Stylesheet compile time: .648 milliseconds Stylesheet execution time: 1634 milliseconds groupedOutput_msxsl.xml (approx. 60000 records, 30MB file size) ----------------------- Source document load time: 11276 milliseconds Stylesheet document load time: 3.053 milliseconds Stylesheet compile time: .652 milliseconds Stylesheet execution time: 40403 milliseconds ============ XSLTRANSFORM: (timings of second transformation, to exclude JIT compilation time) groupedOutput_verysmall_XslTransform.xml (48 records) ---------------------------------------- Source document load time: 30 milliseconds Stylesheet document load time: 10 milliseconds Stylesheet execution time: 130 milliseconds groupedOutput_small_XslTransform.xml (144 records) ------------------------------------ Source document load time: 50 milliseconds Stylesheet document load time: 10 milliseconds Stylesheet execution time: 270 milliseconds groupedOutput_medium_XslTransform.xml (approx. 10000 records) ------------------------------------- [SEVERAL HOURS] groupedOutput_XslTransform.xml (approx. 60000 records, 30MB file size) ------------------------------ [FOREVER ?]
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Duplicate elements. HELP!, Stoaks, Max | Thread | RE: [xsl] xsl-list@lists.mulberryte, Chen Yi |
RE: [xsl] Duplicate elements. HELP!, Stoaks, Max | Date | RE: [xsl] xsl-list@lists.mulberryte, Chen Yi |
Month |