Re: [xsl] select immediately following siblings with constraints?

Subject: Re: [xsl] select immediately following siblings with constraints?
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 23 Feb 2006 12:43:51 -0500
James,

You are asking about a famously difficult area for XSLT to deal with, primarily because the language was designed to go downhill, and operations such as you're describing go up.

Grouping constructs in XSLT 2.0 make them *much* easier to deal with, which is good, but in complex cases like yours there can still be challenges.

The simple answer to your question about XPath is "no". XPath gives no way to look along an axis up to a particular point and no further. (Well, actually I shouldn't say this categorically about XPath 2.0 without giving it some good hard thought first, or inviting others to demonstrate a method. :-) With a little help from XSLT there are ways to get around this limitation (particularly in XPath 2.0), but mostly (so far as I know) these methods haven't been sketched out yet in public forums such as this one. (They are difficult to the point of unmanageability in XPath 1.0 and XPath 2.0 is new.)

What I mean is things like:

<xsl:variable name="next-b" select="following-sibling::b[1]"/>
<xsl:for-each select="following-sibling::a[(. &lt;&lt; $next-b) or not($next-b)]">
...
</xsl:for-each>


but depending on your problem that isn't necessarily enough.

(In fact it's possible but much more difficult to create such constructs in XPath 1.0. This example uses the handly "<<" operator from XPath 2.0, which compares the location of two nodes in document order.)

Due to these difficulties (I hope I'm not just repeating what you know), techniques developed in XSLT (even 1.0) include such methods as the "tree visitor" or forward-stepping pattern, which basically works by using XSLT explicitly to step forward one node at a time, testing along the way. Searching the archives or on the web can turn up examples of this. You can recognize it because there will typically be an apply-templates saying something like:

<xsl:apply-templates select="following-sibling::node()[1]" mode="step"/>

If using XSLT 1.0 I would probably go to this method quickly except in simple cases, where a key-based grouping solution would work. In 2.0 I'd be tempted first to try using its grouping constructs and perhaps fancy XPath 2.0. Which is what I see you're doing.

Bottom line: you're in uncharted waters. Have fun. Tell us what you find! Take a look at those << and >> operators in XPath 2.0.

You could even try such things as

"for $b in following-sibling::b[1] return following-sibling::a[. &lt;&lt; $b]"

which might get it into a single line (by using the 'for' construct in an unorthodox way)....

Cheers,
Wendell

At 02:23 AM 2/23/2006, you wrote:

Hello,

This is a very long message, and I apologize for that.  I've split
this question into a short-ish version followed by a very long
version.  I am using XSL 2.0 draft syntax (as implemented by Saxon).

The short version of my question: Is it possible to write an XPath
expression which selects the immediately adjacent following-siblings
which adhere to a specific criteria?  I'm using a combination of
xsl:if and xsl:for-each-group (with group-adjacent), but I am
wondering if there is a way to do it more directly.

Say I'm processing a list of elements and I want the 'a' elements
which are adjacent to my current context:

  <a id="1"/>
  <b id="2"/>
  <a id="3"/>
  <a id="4"/>
  <d id="5"/>
  <a id="6"/>

if current() is a[@id="1"], I want my XPath expression to evaluate to
nothing since the element following current()
(following-sibling::element()[1]) is '<b id="2"/>'.

If my input sequence was

  <a id="1"/>
  <a id="3"/>
  <a id="4"/>
  <d id="5"/>
  <a id="6"/>

and current() is a[@id="1"], I'd want the expression to select the
sequence

  <a id="3"/>
  <a id="4"/>

I would not want it to include <a id="6"/>, since the chain was
interrupted by '<d id="5"/>'.  As I mentioned above, I'm currently
handling this using combination of xsl:if and xsl:for-each-group.  Is
there another way?

If you have time to be looking over the long-winded version, I've
tried to outline what I'm working on and where I'm getting confused on
how to Do Things Properly.

I'm working out a stylesheet to perform a rather nasty operation:
turning two sets of information selected from an SQL database into a
hierarchy.  I'm dealing with the aspect of performing a merge of
adjacent nodes, and was hoping folks here would be willing to look my
logic over and tell me if there are flaws, or if there is a much
simpler way to handle the problem.

I've got the following sort of input (simplified), which as you can
see is in a rather haphazard order.

  <input>
    <sections>
      <!-- randomly ordered sections, we do know that all tier_num
           values greater than 1 will have a parent defined.  -->
      <section id="1.2" tier_num="2" parent_id="1">Section CDG</section>
      <section id="1.1" tier_num="2" parent_id="1">Section ABC</section>
      <section id="2" tier_num="1">Section XYZ</section>
      <section id="1" tier_num="1">Section ZXY</section>
    </sections>
    <articles>
      <!-- articles are ordered as they should appear in output -->
      <article id="1" section="1.2"/>
      <article id="2" section="2"/>
      <article id="3" section="1.1"/>
      <article id="4" section="1"/>
      <article id="5" section="1.2"/>
    </articles>
  </input>

The real input can actually have an arbitrary number of tiered
sections going to an arbitrary depth.  Given the sample input, I need
to output the articles in the listed order with the proper section
information:

  <output>
    <section>
      <heading>Section ZXY</heading>
      <section>
        <heading>Section CDG</heading>
        <article id="1"/>
      </section>
    </section>
    <section>
      <heading>Section XYZ</heading>
      <article id="2"/>
    </section>
    <section>
      <heading>Section ZXY</heading>
      <section>
        <heading>Section ABC</heading>
        <article id="3"/>
      </section>
      <article id="4"/>
      <section>
        <heading>Section CDG</heading>
        <article id="5"/>
      </section>
    </section>
  </output>

I've written a stylesheet which does this transformation, but I'm
worried about both efficiency and about whether or not I'm missing a
more obvious way to solve the problem.

My current working solution involves creating a series of node sets as
variables, each variable annotated with more information and/or
nesting than the previous, I've broken it down into the following
variable creation steps:

  1) generate list of sections with article children,
  2) fill in any parent sections which are missing,
  3) nest sections per parent_id,
  4) merge sequential sections which share a section id,

The first 2 steps are dealing with lists of sections.  I'm using
xsl:for-each-group with the group-adjacent selector to merge adjacent
sets of articles into single groups, and wrapping each group in its
appropriate section.  I'm then taking this flat list of sections and
inserting any missing sections (ones which did not get created on the
first pass because they didn't happen to have an associated article in
them).  I'm left with a flat list of sections:

 <section id="1" tier_num="1"/>
 <section id="1.2" tier_num="2" parent_id="1">
   <article id="1"/>
 </section>
 <section id="2" tier_num="1">
   <article id="2"/>
 </section>
 <section id="1" tier_num="1"/>
 <section id="1.1" tier_num="2" parent_id="1">
   <article id="3"/>
 </section>
 <section id="1" tier_num="1">
   <article id="4"/>
 </section>
 <section id="1.2" tier_num="2" parent_id="1">
   <article id="1.2"/>
 </section>

My next step is to nest them:

<xsl:variable name="nested_sections">
<xsl:apply-templates select="$all_sections/section[@tier_num = 1]" mode="nest_sections" />
</xsl:variable>
...
<xsl:template match="section" mode="nest_sections" as="element(section)">
<xsl:copy>
<xsl:sequence select="@*|*" />
<xsl:apply-templates
select="following-sibling::section[
(@tier_num = (current()/@tier_num + 1))
and (current() is preceding-sibling::section[@tier_num=current()/@tier_num][1])]"
mode="#current" />
</xsl:copy>
</xsl:template>


which leaves me with

 <section id="1" tier_num="1">
   <section id="1.1" tier_num="2" parent_id="1">
     <article id="1"/>
   </section>
 </section>
 <section id="2" tier_num="1">
   <article id="2"/>
 </section>
 <section id="1" tier_num="1">
   <section id="1.1" tier_num="2" parent_id="1">
     <article id="3"/>
   </section>
 </section>
 <section id="1" tier_num="1">
   <article id="4"/>
   <section id="1.2" tier_num="2" parent_id="1">
     <article id="1.2"/>
   </section>
 </section>

I then merge adjacent sections which share the same section id.  I
came up with two solutions.  I believe the first one is flawed, and
I'm unsure about the efficiency of the second.  I'm calling both
solutions with a named template which recurses if it detects that
adjacent sections with identical ids remain:

<xsl:variable name="merged_sections">
<xsl:call-template name="merge_sections">
<xsl:with-param name="sections" select="$nested_sections"/>
</xsl:call-template>
</xsl:variable>
...
<xsl:template name="merge_sections">
<xsl:param name="sections" as="document-node()"/>
<xsl:choose>
<xsl:when test="exists($sections//section[@id = preceding-sibling::element()[1][self::section]/@id])">
<xsl:call-template name="merge_sections">
<xsl:with-param name="sections">
<xsl:apply-templates select="$sections" mode="merge_sections"/>
</xsl:with-param>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$sections"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>


As you can see, I basically tried to construct a "while" loop which
tested for adjacent sections with identical ids.  I used the XPath
expression

//section[@id = preceding-sibling::element()[1][self::secton]/@id]

instead of

//section[@id = preceding-sibling::section[1]/@id]

to handle a case where two sections with identical ids are separated
by one or more articles.  For example:

 <section id="1" tier_num="1">
   <article id="1"/>
   <section id="1.2" tier_num="2" parent_id="1">
     <article id="2"/>
   </section>
   <article id="3"/>
   <section id="1.2" tier_num="2" parent_id="1">
     <article id="4"/>
   </section>
 </section>

This is a case where I do not want to perform a merge, as it would
result in the articles being listed out of order (article 3 would end
up coming after article 4).

To handle the actual merging, I came up with two solutions, the first
has same sort of problem I was trying to deal with in my "while" loop
test (using [1][self::section]).  I'm at a loss at how to only select
immediately adjacent following-siblings (note the plural).  My 1st
solution fails to account for that when calling xsl:apply-templates:

<!-- solution #1, discarded as being incorrect -->
<xsl:template match="section" mode="merge_sections" as="element(section)?">
<!--
If the immediately preceding element is a section which has the
same id, it should have already been handled. Otherwise we want
to copy it.
-->
<xsl:if test="not(preceding-sibling::element()[1][self::section][@id=current()/@id])">
<xsl:copy>
<xsl:sequence select="@*" />
<xsl:apply-templates select="*" mode="#current"/>
<!--
Merge in any adjacent sections which have the same id
-->
<xsl:apply-templates select="
following-sibling::section[
(@id = current()/@id)
and (current() is preceding-sibling::section[@id != current()/@id][1]/following-sibling::section[1])]/*"
mode="#current"/>
</xsl:copy>
</xsl:if>
</xsl:template>


My second solution combines xsl:if and xsl:for-each-group to try and
ensure we only operate on the following-sibling section(s) if they are
immediately adjacent and share the same section id:

<!-- solution #2, but is it inefficient? -->
<xsl:template match="section" mode="merge_sections" as="element(section)?">
<!--
If the immediately preceding element is a section which has the
same id, it should have already been handled. Otherwise we want
to copy it.
-->
<xsl:if test="not(preceding-sibling::element()[1][self::section][@id=current()/@id])">
<xsl:copy>
<xsl:sequence select="@*" />
<xsl:apply-templates select="*" mode="#current"/>
<xsl:variable name="context" select="current()"/>
<!--
Merge in any adjacent sections which have the same id
-->
<xsl:if test="following-sibling::element()[1][self::section][@id = current()/@id]">
<xsl:for-each-group select="following-sibling::section" group-adjacent="@id">
<xsl:if test="$context is current-group()[1]/preceding-sibling::element()[1]
and $context/@id = current-group()[1]/@id">
<xsl:apply-templates select="current-group()/*" mode="#current"/>
</xsl:if>
</xsl:for-each-group>
</xsl:if>
</xsl:copy>
</xsl:if>
</xsl:template>


So, if I understand what I am doing correctly, I am capturing the
groups of following-sibling sections, then testing each grouping to
see if the sequence is immediately adjacent to my $context, and that
it has the same id.

Is there a better way, along the lines of

./following-sibling::element()[1][self::section]

where we are not limited to just the first element?  Where we can
select all of the consecutive following-sibling section elements?  Is
it wasteful to be running this grouping operation when all I really
want are just the next handful of section elements?

Do you folks see a simpler way to approach the problems?  Am I going
about this the wrong way?


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread