Re: [xsl] Processing Efficiently

Subject: Re: [xsl] Processing Efficiently
From: Karl Stubsjoen <kstubs@xxxxxxxxx>
Date: Fri, 10 Jun 2005 08:43:26 -0700
What is the cost for loading up a variable with a large XML source?  So:

<xsl:variable name="my_variable"
select="document('my_very_large_source.xml')"/>

Where 'my_very_large_source.xml' is 25MB +

There was a noticeable (improved) difference in processing but was
that because A) loading up the smaller condensed xml source was just
that much easier for the processor B) queries against a condensed xml
source is quicker.

> I haven't looked at this in detail, but I think you can almost certainly
> solve your performance problems using keys. Look for constructs like
I will try this.
What about variable definitions that might be a pointer at a section
of your xml source that you refer to often, would this improve
perfromance too?


On 6/10/05, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> I haven't looked at this in detail, but I think you can almost certainly
> solve your performance problems using keys. Look for constructs like
> //thing[property=value] and replace them with calls on the key() function.
>
> Michael Kay
> http://www.saxonica.com/
>
> > -----Original Message-----
> > From: Karl Stubsjoen [mailto:kstubs@xxxxxxxxx]
> > Sent: 08 June 2005 20:34
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: Re: [xsl] Processing Efficiently
> >
> > I had to all ready reduce the size of the XML quite a bit by sheer
> > element renaming and elination of unused elements.  $s use to be 25MB,
> > but by eliminating unused elements (really needed 2) and by renaming
> > "xlsRow" to "R" and "xlsColumn" to "C" and by renaming the attribute
> > "column" to "c" I was able to reduce the size by 1/3.
> >
> > The thing is this:  $s is my master doc, contains the lookup records.
> > I have many individual docs that will be compared agains $s, and these
> > files range in size from 20KB to 5MB (appx.).  I don't mind a
> > different approach (for example reducing $s source).  I'm just curious
> > how others would approach something like this.  How would you arrange
> > such documentation for this sort of processing?
> >
> > The scenario is:
> > Large data file for lookups / validation (10 to 20MB)
> > Individual data files (up to 5MB)
> > As individual data files refresh, identify those items that exist in
> > the master list.  Again, this is a topic of "Performance" and "Best
> > Practice" for peforming frequent validations of documents this size.
> >
> >
> >
> > On 6/8/05, tomas.vanek@xxxxxxxxxxxxx
> > <tomas.vanek@xxxxxxxxxxxxx> wrote:
> > > using keys could help to speed up the transformation (here
> > is just the
> > > idea):
> > >
> > > ...
> > >        <xsl:key name="summaryInvoice"
> > > use="document('summary.xml')//xls/R" match="C[@c='I']"/>
> > >
> > > ...
> > >        <xsl:template match="xlsRow">
> > >                <xsl:variable name="current_invoice"
> > > select="xlsColumn[@column='Invoice_#']"/>
> > >                <xsl:variable name="current_balance"
> > > select="key('summaryInvoice', $current_invoice)/C[@c='B']"/>
> > >                <xsl:variable name="diff_balance"
> > > select="$current_balance - xlsColumn[@column='Balance']"/>
> > > ...
> > >
> > > tomi
> > >
> > >
> > > -----Original Message-----
> > > From: Karl Stubsjoen [mailto:kstubs@xxxxxxxxx]
> > > Sent: Wednesday, June 08, 2005 10:08 AM
> > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > > Subject: [xsl] Processing Efficiently
> > >
> > > Hello,
> > > I would like to optimize the following:
> > >
> > > Where $s is a 5MB document and the source document is app 2-5MB.
> > > The goal:  copy everything in the source that exists in $s.
> > > Catch:  need to know the value of the balance in $s.
> > >
> > > $s looks like:
> > > <xls>
> > > <R row="2">
> > >  <C c="I">2AA9379</C><!-- match value "invoice" -->
> > >  <C c="B">-127.5</C><!-- this is the balance --> </R> ...
> > > </xls>
> > >
> > > <xsl:stylesheet version="1.0"
> > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
> > > <xsl:output method="xml" indent="yes" encoding="utf-8"/>
> > >
> > > <xsl:variable name="s"
> > > select="document('summarydata/summaryreduced.xml')//xls/R"/>
> > >
> > > <xsl:template match="/">
> > > <result>
> > > <xsl:apply-templates
> > > select="xls/xlsRow[xlsColumn[@column='Invoice_#']=$s/C[@c='I'] |
> > > xlsColumn[@column='Balance'][not(.= $s/C[@c='B'])]]"/> </result>
> > > </xsl:template>
> > >
> > > <xsl:template match="xlsRow">
> > > <xsl:variable name="current_invoice"
> > > select="xlsColumn[@column='Invoice_#']"/>
> > > <xsl:variable name="current_balance"
> > > select="$s[C[@c='I']=$current_invoice]/C[@c'B']"/>
> > > <xsl:variable name="diff_balance" select="$current_balance -
> > > xlsColumn[@column='Balance']"/> <xsl:copy> <xsl:apply-templates
> > > select="@*"/> <xsl:attribute name="current_balance"><xsl:value-of
> > > select="$current_balance"/></xsl:attribute>
> > > <xsl:attribute name="diff_balance"><xsl:value-of
> > > select="$diff_balance"/></xsl:attribute>
> > >  <xsl:apply-templates select="xlsColumn"/> </xsl:copy>
> > </xsl:template>
> > >
> > > <xsl:template match="@*">
> > > <xsl:copy>
> > >  <xsl:apply-templates select="@*"/>
> > > </xsl:copy>
> > > </xsl:template>
> > >
> > > <xsl:template match="xlsColumn">
> > > <xsl:copy-of select="."/>
> > > </xsl:template>
> > >
> > > </xsl:stylesheet>
> > >
> > >
> > >
> > > This message is for the designated recipient only and may
> > contain privileged, proprietary, or otherwise private
> > information.  If you have received it in error, please notify
> > the sender immediately and delete the original.  Any other
> > use of the email by you is prohibited.

Current Thread