RE: [xsl] Processing Efficiently

Subject: RE: [xsl] Processing Efficiently
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 10 Jun 2005 14:25:02 +0100
I haven't looked at this in detail, but I think you can almost certainly
solve your performance problems using keys. Look for constructs like
//thing[property=value] and replace them with calls on the key() function.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Karl Stubsjoen [mailto:kstubs@xxxxxxxxx] 
> Sent: 08 June 2005 20:34
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] Processing Efficiently
> 
> I had to all ready reduce the size of the XML quite a bit by sheer
> element renaming and elination of unused elements.  $s use to be 25MB,
> but by eliminating unused elements (really needed 2) and by renaming
> "xlsRow" to "R" and "xlsColumn" to "C" and by renaming the attribute
> "column" to "c" I was able to reduce the size by 1/3.
> 
> The thing is this:  $s is my master doc, contains the lookup records. 
> I have many individual docs that will be compared agains $s, and these
> files range in size from 20KB to 5MB (appx.).  I don't mind a
> different approach (for example reducing $s source).  I'm just curious
> how others would approach something like this.  How would you arrange
> such documentation for this sort of processing?
> 
> The scenario is:
> Large data file for lookups / validation (10 to 20MB)
> Individual data files (up to 5MB) 
> As individual data files refresh, identify those items that exist in
> the master list.  Again, this is a topic of "Performance" and "Best
> Practice" for peforming frequent validations of documents this size.
> 
> 
> 
> On 6/8/05, tomas.vanek@xxxxxxxxxxxxx 
> <tomas.vanek@xxxxxxxxxxxxx> wrote:
> > using keys could help to speed up the transformation (here 
> is just the
> > idea):
> > 
> > ...
> >        <xsl:key name="summaryInvoice"
> > use="document('summary.xml')//xls/R" match="C[@c='I']"/>
> > 
> > ...
> >        <xsl:template match="xlsRow">
> >                <xsl:variable name="current_invoice"
> > select="xlsColumn[@column='Invoice_#']"/>
> >                <xsl:variable name="current_balance"
> > select="key('summaryInvoice', $current_invoice)/C[@c='B']"/>
> >                <xsl:variable name="diff_balance"
> > select="$current_balance - xlsColumn[@column='Balance']"/>
> > ...
> > 
> > tomi
> > 
> > 
> > -----Original Message-----
> > From: Karl Stubsjoen [mailto:kstubs@xxxxxxxxx]
> > Sent: Wednesday, June 08, 2005 10:08 AM
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: [xsl] Processing Efficiently
> > 
> > Hello,
> > I would like to optimize the following:
> > 
> > Where $s is a 5MB document and the source document is app 2-5MB.
> > The goal:  copy everything in the source that exists in $s.
> > Catch:  need to know the value of the balance in $s.
> > 
> > $s looks like:
> > <xls>
> > <R row="2">
> >  <C c="I">2AA9379</C><!-- match value "invoice" -->
> >  <C c="B">-127.5</C><!-- this is the balance --> </R> ...
> > </xls>
> > 
> > <xsl:stylesheet version="1.0"
> > xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
> > <xsl:output method="xml" indent="yes" encoding="utf-8"/>
> > 
> > <xsl:variable name="s"
> > select="document('summarydata/summaryreduced.xml')//xls/R"/>
> > 
> > <xsl:template match="/">
> > <result>
> > <xsl:apply-templates
> > select="xls/xlsRow[xlsColumn[@column='Invoice_#']=$s/C[@c='I'] |
> > xlsColumn[@column='Balance'][not(.= $s/C[@c='B'])]]"/> </result>
> > </xsl:template>
> > 
> > <xsl:template match="xlsRow">
> > <xsl:variable name="current_invoice"
> > select="xlsColumn[@column='Invoice_#']"/>
> > <xsl:variable name="current_balance"
> > select="$s[C[@c='I']=$current_invoice]/C[@c'B']"/>
> > <xsl:variable name="diff_balance" select="$current_balance -
> > xlsColumn[@column='Balance']"/> <xsl:copy> <xsl:apply-templates
> > select="@*"/> <xsl:attribute name="current_balance"><xsl:value-of
> > select="$current_balance"/></xsl:attribute>
> > <xsl:attribute name="diff_balance"><xsl:value-of
> > select="$diff_balance"/></xsl:attribute>
> >  <xsl:apply-templates select="xlsColumn"/> </xsl:copy> 
> </xsl:template>
> > 
> > <xsl:template match="@*">
> > <xsl:copy>
> >  <xsl:apply-templates select="@*"/>
> > </xsl:copy>
> > </xsl:template>
> > 
> > <xsl:template match="xlsColumn">
> > <xsl:copy-of select="."/>
> > </xsl:template>
> > 
> > </xsl:stylesheet>
> > 
> > 
> > 
> > This message is for the designated recipient only and may 
> contain privileged, proprietary, or otherwise private 
> information.  If you have received it in error, please notify 
> the sender immediately and delete the original.  Any other 
> use of the email by you is prohibited.

Current Thread