Re: [xsl] Performance problem in transformation

Subject: Re: [xsl] Performance problem in transformation
From: Jeni Tennison <mail@xxxxxxxxxxxxxxxx>
Date: Fri, 22 Jun 2001 06:50:26 +0100
Hi Shashank,

> I am trying to filter out duplicate records from input XML document.
> If I have around 80 records in the XML document and out of which 43
> are unique, transformation is taking forever to complete. (the size
> of this input XML document is 223K)
> Can you suggest any better ways of removing duplicate records ?

>From the bit of XSLT that you posted, it looks as though your source
is something like:




The most efficient method of identifying unique records is to use the
Muenchian method.  This uses a key to identify all the records with
the same identifier.  In your case, you want the key to index the
sales_orders_sd_doc elements by their value.  The key has to match the
elements that you're indexing (i.e. the sales_orders_sd_doc elements)
and use the identifying value (i.e. the value of that element).  You
can give it any name that you want:

<xsl:key name="sales_orders"
         use="." />

With the key set up (this goes at the top level of your stylesheet),
you can then retrieve all the sales_orders_sd_doc elements with a
particular value with the key() function.  For example, to get all
that have the value 'ABC', you can use:

  key('sales_orders', 'ABC')

Now, the first sales_orders_sd_doc element that will be retrieved from
the key is the one that appears first in document order.  For any
particular value, there will only be a single sales_orders_sd_doc
element that is the first retrieved by the key for that value.  So to
get the unique ones, you need to run over all those elements and work
out whether they are the same as the first one retrieved from the key.
You can compare the two nodes by generating an ID for each and
comparing them.  This gives you the expression:

  /event/sales_orders_sd_doc[generate-id() =
                             generate-id(key('sales_orders', .)[1])]

So you can set your $unique-list variable to this node set:

<xsl:variable name="unique-list"
                         [generate-id() =
                          generate-id(key('sales_orders', .)[1])]" />

The other source of inefficiency in your design is how you're
iterating over the nodes, using an index to access them rather than
just applying templates to the nodes.  I'm not sure that I can exactly
follow why you're doing what you're doing, but I think that all you
need to do is apply templates to the nodes in the $unique-list

<xsl:template match="event">
   <ROOT message="test">
      <xsl:apply-templates select="$unique-list" />

And then have a template that matches them and creates the Row
elements that you want.  You can get the values for the various fields
in the Row by looking at the immediate following siblings for the
sales_orders_sd_doc element that you're currently looking at using the
following-sibling:: axis (if the related fields actually come *before*
the sales_orders_sd_doc element, then use the preceding-sibling:: axis

<xsl:template match="sales_orders_sd_doc">
            select="following-sibling::sales_orders_base_uom[1]" />
            select="following-sibling::sales_orders_div_qty[1]" />
            select="following-sibling::sales_orders_exchg_rate_v[1]" />
         <xsl:value-of select="." />

I hope that helps,


Jeni Tennison

 XSL-List info and archive:

Current Thread