Subject: Re: [xsl] Processing Efficiently From: Karl Stubsjoen <kstubs@xxxxxxxxx> Date: Fri, 10 Jun 2005 08:43:26 -0700 |
What is the cost for loading up a variable with a large XML source? So: <xsl:variable name="my_variable" select="document('my_very_large_source.xml')"/> Where 'my_very_large_source.xml' is 25MB + There was a noticeable (improved) difference in processing but was that because A) loading up the smaller condensed xml source was just that much easier for the processor B) queries against a condensed xml source is quicker. > I haven't looked at this in detail, but I think you can almost certainly > solve your performance problems using keys. Look for constructs like I will try this. What about variable definitions that might be a pointer at a section of your xml source that you refer to often, would this improve perfromance too? On 6/10/05, Michael Kay <mike@xxxxxxxxxxxx> wrote: > I haven't looked at this in detail, but I think you can almost certainly > solve your performance problems using keys. Look for constructs like > //thing[property=value] and replace them with calls on the key() function. > > Michael Kay > http://www.saxonica.com/ > > > -----Original Message----- > > From: Karl Stubsjoen [mailto:kstubs@xxxxxxxxx] > > Sent: 08 June 2005 20:34 > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > > Subject: Re: [xsl] Processing Efficiently > > > > I had to all ready reduce the size of the XML quite a bit by sheer > > element renaming and elination of unused elements. $s use to be 25MB, > > but by eliminating unused elements (really needed 2) and by renaming > > "xlsRow" to "R" and "xlsColumn" to "C" and by renaming the attribute > > "column" to "c" I was able to reduce the size by 1/3. > > > > The thing is this: $s is my master doc, contains the lookup records. > > I have many individual docs that will be compared agains $s, and these > > files range in size from 20KB to 5MB (appx.). I don't mind a > > different approach (for example reducing $s source). I'm just curious > > how others would approach something like this. How would you arrange > > such documentation for this sort of processing? > > > > The scenario is: > > Large data file for lookups / validation (10 to 20MB) > > Individual data files (up to 5MB) > > As individual data files refresh, identify those items that exist in > > the master list. Again, this is a topic of "Performance" and "Best > > Practice" for peforming frequent validations of documents this size. > > > > > > > > On 6/8/05, tomas.vanek@xxxxxxxxxxxxx > > <tomas.vanek@xxxxxxxxxxxxx> wrote: > > > using keys could help to speed up the transformation (here > > is just the > > > idea): > > > > > > ... > > > <xsl:key name="summaryInvoice" > > > use="document('summary.xml')//xls/R" match="C[@c='I']"/> > > > > > > ... > > > <xsl:template match="xlsRow"> > > > <xsl:variable name="current_invoice" > > > select="xlsColumn[@column='Invoice_#']"/> > > > <xsl:variable name="current_balance" > > > select="key('summaryInvoice', $current_invoice)/C[@c='B']"/> > > > <xsl:variable name="diff_balance" > > > select="$current_balance - xlsColumn[@column='Balance']"/> > > > ... > > > > > > tomi > > > > > > > > > -----Original Message----- > > > From: Karl Stubsjoen [mailto:kstubs@xxxxxxxxx] > > > Sent: Wednesday, June 08, 2005 10:08 AM > > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > > > Subject: [xsl] Processing Efficiently > > > > > > Hello, > > > I would like to optimize the following: > > > > > > Where $s is a 5MB document and the source document is app 2-5MB. > > > The goal: copy everything in the source that exists in $s. > > > Catch: need to know the value of the balance in $s. > > > > > > $s looks like: > > > <xls> > > > <R row="2"> > > > <C c="I">2AA9379</C><!-- match value "invoice" --> > > > <C c="B">-127.5</C><!-- this is the balance --> </R> ... > > > </xls> > > > > > > <xsl:stylesheet version="1.0" > > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> > > > <xsl:output method="xml" indent="yes" encoding="utf-8"/> > > > > > > <xsl:variable name="s" > > > select="document('summarydata/summaryreduced.xml')//xls/R"/> > > > > > > <xsl:template match="/"> > > > <result> > > > <xsl:apply-templates > > > select="xls/xlsRow[xlsColumn[@column='Invoice_#']=$s/C[@c='I'] | > > > xlsColumn[@column='Balance'][not(.= $s/C[@c='B'])]]"/> </result> > > > </xsl:template> > > > > > > <xsl:template match="xlsRow"> > > > <xsl:variable name="current_invoice" > > > select="xlsColumn[@column='Invoice_#']"/> > > > <xsl:variable name="current_balance" > > > select="$s[C[@c='I']=$current_invoice]/C[@c'B']"/> > > > <xsl:variable name="diff_balance" select="$current_balance - > > > xlsColumn[@column='Balance']"/> <xsl:copy> <xsl:apply-templates > > > select="@*"/> <xsl:attribute name="current_balance"><xsl:value-of > > > select="$current_balance"/></xsl:attribute> > > > <xsl:attribute name="diff_balance"><xsl:value-of > > > select="$diff_balance"/></xsl:attribute> > > > <xsl:apply-templates select="xlsColumn"/> </xsl:copy> > > </xsl:template> > > > > > > <xsl:template match="@*"> > > > <xsl:copy> > > > <xsl:apply-templates select="@*"/> > > > </xsl:copy> > > > </xsl:template> > > > > > > <xsl:template match="xlsColumn"> > > > <xsl:copy-of select="."/> > > > </xsl:template> > > > > > > </xsl:stylesheet> > > > > > > > > > > > > This message is for the designated recipient only and may > > contain privileged, proprietary, or otherwise private > > information. If you have received it in error, please notify > > the sender immediately and delete the original. Any other > > use of the email by you is prohibited.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Processing Efficiently, Michael Kay | Thread | RE: [xsl] Processing Efficiently, tomas.vanek |
Re: [xsl] Pipe Question, Jon Gorman | Date | Re: [xsl] Pipe Question, Karl Stubsjoen |
Month |