Re: [xsl] Processing Efficiently

Subject: Re: [xsl] Processing Efficiently
From: JBryant@xxxxxxxxx
Date: Wed, 8 Jun 2005 15:00:12 -0500
Hi, Karl,

I'd usually look at categorization. For example, I create data 
dictionaries that describe the structure and content of data warehouses. 
The elements that need to be described fall into some natural categories: 
tables, columns, domains, constraints, etc. By keeping table information 
in one file (or several, actually), column information in another, domain 
information in a third, and so on, I can keep any given source file down 
to a reasonable size. Then, to create my output files (the data 
dictionaries), I pull the elements I need from each file and combine them. 
For each dictionary (which changes per customer), I use a control file 
that tells me which elements to get from which file.

Also, you might want to look at topic mapping (search for XTM). Through 
topic mapping, you can create associations between various things, 
regardless of which files contain the things. It would be some work to set 
up, but it would let you break files at arbitrary size points and still 
find the right stuff. (It might be easier to break at something slightly 
less arbitrary than size, such as letter of the alphabet.)

Just some ideas.

Jay Bryant
Bryant Communication Services
(presently consulting at Synergistic Solution Technologies)





Karl Stubsjoen <kstubs@xxxxxxxxx> 
06/08/2005 02:33 PM
Please respond to
xsl-list@xxxxxxxxxxxxxxxxxxxxxx


To
xsl-list@xxxxxxxxxxxxxxxxxxxxxx
cc

Subject
Re: [xsl] Processing Efficiently






I had to all ready reduce the size of the XML quite a bit by sheer
element renaming and elination of unused elements.  $s use to be 25MB,
but by eliminating unused elements (really needed 2) and by renaming
"xlsRow" to "R" and "xlsColumn" to "C" and by renaming the attribute
"column" to "c" I was able to reduce the size by 1/3.

The thing is this:  $s is my master doc, contains the lookup records. 
I have many individual docs that will be compared agains $s, and these
files range in size from 20KB to 5MB (appx.).  I don't mind a
different approach (for example reducing $s source).  I'm just curious
how others would approach something like this.  How would you arrange
such documentation for this sort of processing?

The scenario is:
Large data file for lookups / validation (10 to 20MB)
Individual data files (up to 5MB) 
As individual data files refresh, identify those items that exist in
the master list.  Again, this is a topic of "Performance" and "Best
Practice" for peforming frequent validations of documents this size.



On 6/8/05, tomas.vanek@xxxxxxxxxxxxx <tomas.vanek@xxxxxxxxxxxxx> wrote:
> using keys could help to speed up the transformation (here is just the
> idea):
> 
> ...
>        <xsl:key name="summaryInvoice"
> use="document('summary.xml')//xls/R" match="C[@c='I']"/>
> 
> ...
>        <xsl:template match="xlsRow">
>                <xsl:variable name="current_invoice"
> select="xlsColumn[@column='Invoice_#']"/>
>                <xsl:variable name="current_balance"
> select="key('summaryInvoice', $current_invoice)/C[@c='B']"/>
>                <xsl:variable name="diff_balance"
> select="$current_balance - xlsColumn[@column='Balance']"/>
> ...
> 
> tomi
> 
> 
> -----Original Message-----
> From: Karl Stubsjoen [mailto:kstubs@xxxxxxxxx]
> Sent: Wednesday, June 08, 2005 10:08 AM
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Processing Efficiently
> 
> Hello,
> I would like to optimize the following:
> 
> Where $s is a 5MB document and the source document is app 2-5MB.
> The goal:  copy everything in the source that exists in $s.
> Catch:  need to know the value of the balance in $s.
> 
> $s looks like:
> <xls>
> <R row="2">
>  <C c="I">2AA9379</C><!-- match value "invoice" -->
>  <C c="B">-127.5</C><!-- this is the balance --> </R> ...
> </xls>
> 
> <xsl:stylesheet version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
> <xsl:output method="xml" indent="yes" encoding="utf-8"/>
> 
> <xsl:variable name="s"
> select="document('summarydata/summaryreduced.xml')//xls/R"/>
> 
> <xsl:template match="/">
> <result>
> <xsl:apply-templates
> select="xls/xlsRow[xlsColumn[@column='Invoice_#']=$s/C[@c='I'] |
> xlsColumn[@column='Balance'][not(.= $s/C[@c='B'])]]"/> </result>
> </xsl:template>
> 
> <xsl:template match="xlsRow">
> <xsl:variable name="current_invoice"
> select="xlsColumn[@column='Invoice_#']"/>
> <xsl:variable name="current_balance"
> select="$s[C[@c='I']=$current_invoice]/C[@c'B']"/>
> <xsl:variable name="diff_balance" select="$current_balance -
> xlsColumn[@column='Balance']"/> <xsl:copy> <xsl:apply-templates
> select="@*"/> <xsl:attribute name="current_balance"><xsl:value-of
> select="$current_balance"/></xsl:attribute>
> <xsl:attribute name="diff_balance"><xsl:value-of
> select="$diff_balance"/></xsl:attribute>
>  <xsl:apply-templates select="xlsColumn"/> </xsl:copy> </xsl:template>
> 
> <xsl:template match="@*">
> <xsl:copy>
>  <xsl:apply-templates select="@*"/>
> </xsl:copy>
> </xsl:template>
> 
> <xsl:template match="xlsColumn">
> <xsl:copy-of select="."/>
> </xsl:template>
> 
> </xsl:stylesheet>
> 
> 
> 
> This message is for the designated recipient only and may contain 
privileged, proprietary, or otherwise private information.  If you have 
received it in error, please notify the sender immediately and delete the 
original.  Any other use of the email by you is prohibited.

Current Thread