[xsl] Some more stuff about selecting unique elements

Subject: [xsl] Some more stuff about selecting unique elements
From: Joerg Pietschmann <joerg.pietschmann@xxxxxx>
Date: Tue, 10 Apr 2001 14:19:39 +0200
Hello,
the "Confused about preceding-sibling..." post inspired me to the
following question(s) at the end of this post.
I have a XML similar to

<level0>
  <level1>
    <level2>
      <stuff>1</stuff>
      <stuff>2</stuff>
      <stuff>3</stuff>
    </level2>
    <level2>
      <stuff>3</stuff>
      <stuff>4</stuff>
      <stuff>5</stuff>
    </level2>
  </level1>
  <level1>
    <level2>
      <stuff>2</stuff>
      <stuff>4</stuff>
      <stuff>6</stuff>
    </level2>
    <level2>
      <stuff>4</stuff>
      <stuff>6</stuff>
      <stuff>8</stuff>
    </level2>
  </level1>
</level0>

The levelN elements represent some kind of context. I want to copy
the structure while throwing out the level2 elements and the stuff-elements
that are duplicated within their level1 context, however, they may be
duplicated in different level1 contexts:

<level0>
  <level1>
    <stuff>1</stuff>
    <stuff>2</stuff>
    <stuff>3</stuff>
    <stuff>4</stuff>
    <stuff>5</stuff>
  </level1>
  <level1>
    <stuff>2</stuff>
    <stuff>4</stuff>
    <stuff>6</stuff>
    <stuff>8</stuff>
  </level1>
</level0>

After some experiments, the following XSL seems to achieve this

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
  <xsl:output method="xml" encoding="ASCII"/>

  <xsl:template match="*">
    <xsl:element name="{name()}">
      <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>
  
  <xsl:template match="level1">
    <level1>
      <xsl:apply-templates select=".//stuff[
                           not(.=preceding::stuff[
                           generate-id(current())
                           =generate-id(ancestor::level1)])]"/>
    </level1>
  </xsl:template>
  
</xsl:stylesheet>

Explanation: select descendend stuff elements which do not have
the same content as a preceding stuff element which has the same
level1 element as ancestor as the current level1 element actually
is.

Well, the problem i dislike the preceding axis is performance, imagine
an XML file with some hundred or thousand level1 elements. I havn't
checked it with my file containing some 500+ level1 elements because
it is convenient for me for some other reasons to have a batch process
splitting the file in small files each containing a level1 element, process
them and merge them in a third step.

What optimisations do XSL processors (read: Saxon) while processing this
XSL? Are there other solutions (in pure XSLT 1.0) to the problem and that
are better suited to already implemented optimisations? Would it help to
use a xsl:key for selecting the preceding stuff elements with the same
ancestor?

I suppose in XSLT 1.1 (2.0) where RTFs are gone it would be prudent to
construct a copy of the stuff elements that descend from the given level1 element
and select from there, as i think the preceding axis will work on the node-set
with the copies only. Take the following snippet as illustration:

<xsl:template match="level1">
  <xsl:variable name="stuff">
    <xsl:for-each select=".//stuff">
      <xsl:copy-of select="."/>
    </xsl:for-each>
  </xsl:variable>
  <level1>
    <xsl:apply-templates select="$stuff[not(.=preceding::stuff)]"/>
  </level1>
</xsl:template>

Is this correct? Would this work as expected (with some slack as it is obviously
untested)? Could it be expected to be more performant than the solution above?

<grin/>

Regards
J.Pietschmann

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread