Re: [xsl] Benefits of using xsl:key

Subject: Re: [xsl] Benefits of using xsl:key
From: Michael Müller-Hillebrand <mmh@xxxxxxxxxxxxx>
Date: Tue, 3 Nov 2009 10:21:46 +0100
Hi Jesper,

Am 03.11.2009 um 08:55 schrieb Jesper Tverskov:

We often hear that the use of xsl:key can speed up a transformation
many times. But XSLT processors seem so optimized for speed, anyway,
these days, that I have found it difficult to construct a test input
file and a test XSLT stylesheet demonstrating the dramatic wonders of
using xsl:key.

How big should the input document be?
How complicated the hierarchy, etc?

I would like to impress my students with a nice little transformation
example, instead of just telling them that there can be huge benefits.

about a year ago I had such a situation. With test documents all went fine, but the customer used XML files of about 20 MBytes and they waited. They complained about it only after months and I was really ashamed.

The task was to remove duplicate content from an XML file, to ease the
translation process. All duplicate elements should be left in the
result file, but empty and with an extra attribute @xrefid which would
allow to revert the process after translation.

The input was essentially a list, the original with a lot more
attributes and of course thousands of entries:

<?xml version="1.0" encoding="UTF-8"?>
<Doc>
 <value oid="f37">some text</value>
 <value oid="f61">some text</value>
 <value oid="f042">some other text</value>
</Doc>

The desired result was:

<?xml version="1.0" encoding="UTF-8"?>
<Doc>
 <value oid="f37">some text</value>
 <value oid="f61" xrefid="f37" />
 <value oid="f042">some other text</value>
</Doc>

My first take was (I know I should have known better):

<xsl:template match="value[.=preceding::value]">
 <xsl:copy>
  <!-- add attribute and skip content -->
  <xsl:apply-templates select="@*"/>
  <xsl:attribute name="xrefid" select="preceding::value[.=current()]
[last()]/@oid"/>
 </xsl:copy>
</xsl:template>

<xsl:template match="value">
 <xsl:copy>
  <xsl:apply-templates select="@*|node()"/>
 </xsl:copy>
</xsl:template>


This was really bad with large files, because for every value a look- up of all preceding values had to be made. So I changed it to

<xsl:key name="value-content" match="value" use="."/>

<xsl:template match="value">
 <xsl:variable name="first" select="key('value-content', .)[1]"/>
 <xsl:copy>
  <xsl:choose>
   <xsl:when test=". is $first">
    <!-- pass through content -->
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates/>
   </xsl:when>
   <xsl:otherwise>
     <!-- add attribute and skip content -->
     <xsl:attribute name="xrefid" select="$first/@oid"/>
     <xsl:apply-templates select="@*"/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:copy>
</xsl:template>

Now the solution worked like a breeze with really large files.

HTH,

- Michael

PS: Examples have been edited and not tested.

--
_______________________________________________________________
Michael M|ller-Hillebrand: Dokumentations-Technologie
Adobe Certified Expert, FrameMaker
Lvsungen und Training, FrameScript, XML/XSL, Unicode
Blog: http://cap-studio.de/ - Tel. +49 (9131) 28747

Current Thread