Re: [xsl] Merging multiple documents efficiently

Subject: Re: [xsl] Merging multiple documents efficiently
From: Jeni Tennison <mail@xxxxxxxxxxxxxxxx>
Date: Sat, 3 Feb 2001 11:35:03 +0000
Hi Matt,

> Here is (part of) my existing code. It generally works, but is quite
> inefficient for large input files.
>
> <xsl:template match="Hierarchy">
>      <xsl:for-each select="NodeID[. = $PRD/ListofProducts/Product/NodeID]|
>                       NodeID[. = $HRY/Hierarchies/Hierarchy/NodeID]|
>                       NodeID[. = $MSD/MDSDs/MSDS/NodeID]">
>           <xsl:for-each select="..">
>                // Perform matching based on NodeID
>                ...
>           </xsl:for-each>
>      </xsl:for-each>
> </xsl:template>

One thing that could make it more efficient would be to move the
NodeIDs that you're matching against into variables so that the node
sets don't have be constructed each time a NodeID in your primary
document is looked at in the XPath:

<xsl:variable name="PRDIDs"
              select="$PRD/ListofProducts/Product/NodeID" />
<xsl:variable name="HRYIDs"
              select="$HRY/Hierarchies/Hierarchy/NodeID" />
<xsl:varaible name="MSDIDs"
              select="$MSD/MDSDs/MSDS/NodeID" />
<xsl:variable name="otherIDs"
              select="$PRDIDs | $HRYIDs | $MSDIDs" />
<xsl:template match="Hierarchy">
   <xsl:for-each select="NodeID[. = $otherIDs]">
      ...
   </xsl:for-each>
</xsl:template>

Alternatively, if you've got lots of NodeIDs then you could use a key
to make it more efficient. This would allow you to quickly get
whichever NodeID matches a particular NodeID.

<xsl:key name="ids" match="NodeID" use="." />

Because you're dealing with different documents, you need to change
the context node to each of these documents to search those documents
when using the key. I'm assuming here that $PRD, $HRY and $MSD are the
root nodes (or document elements) of their respective documents.  If
you just want to test whether there's a NodeID in one of the secondary
documents with that value (rather than actually using that node from
the secondary document), then you can use:

<xsl:template match="Hierarchy">
   <xsl:for-each select="NodeID">
      <xsl:if test="($PRD | $HRY | $MSD)[key('ids', current())]">
         ...
      </xsl:if>
   </xsl:for-each>
</xsl:template>

I notice in your input that you've got repeated NodeID elements in the
primary document.  If you want to get rid of them, then you could also
use the key for that.  This is the Muenchian Method of getting nodes
with unique values:

<xsl:template match="Hierarchy">
   <xsl:for-each select="NodeID[count(.|key('ids', .)[1]) = 1]">
      <xsl:if test="($PRD | $HRY | $MSD)[key('ids', current())]">
         ...
      </xsl:if>
   </xsl:for-each>
</xsl:template>

I hope that helps,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread