RE: [xsl] search for aligned elements

Subject: RE: [xsl] search for aligned elements
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 18 Jul 2007 21:28:23 +0100
You haven't really explained exactly what you mean by "aligned elements". My
guess is that you're looking for overlapping events: that is two events E,F
such that E/@start <= F/@end and E/@end >= F/@start (or vice versa). Clearly
a naive approach - examining all pairs of events - will have O(n^2)
performance, and it seems you want to do better than this. A better approach
is to create a sorted sequence of events and then for each event examine its
near neighbours to see if there's an overlapping event in a different track.
If you sort first by start time and then by end time, you can find the
events that overlap a given event E by searching forwards until you find an
event whose start time is later than the end time of E. (It might also have
been found as an overlapping event for some previous event). Because you
want to break out of the loop at this point (for efficiency) you're going to
have to use recursion. So in XSLT 2.0:

<xsl:template match="body">
  <xsl:variable name="sorted-events" as="element(e1)*">
    <xsl:perform-sort select="track/e1"/>
      <xsl:sort select="number(@start)"/>
      <xsl:sort select="number(@end)"/>
    </xsl:perform-sort>
  </xsl:variable>
  <xsl:variable name="overlapping-pairs" as="element(pair)">
    <xsl:for-each select="$sorted-events">
      <xsl:sequence select="f:find-overlapping-events(., 
                                subsequence($sorted-events,
position()+1))"/>
    </xsl:for-each>
  </xsl:variable>
  <!-- now process the pairs. For now, just output them -->
  <xsl:copy-of select="$overlapping-pairs"/>
</xsl:template>

<!-- Find the events in $others that overlap a given event $this -->

<xsl:function name="f:find-overlapping-events">
  <xsl:param name="this" as="element(e1)"/>
  <xsl:param name="others" as="element(e1)*"/>
  <xsl:if test="exists($others) and f:overlaps($this, $others[1])">
        <pair track1="{$this/../@name}" 
              event1="{$this/some_event_id}"
              track2="{$other/../@name}"
              event2="{$other/some_event_id}"/>
        <xsl:sequence select="f:find-overlapping-events($this,
subsequence($others, 2))"/>
  </xsl:if>
</xsl:function>

<!-- Test whether $first overlaps $second

<xsl:function name="f:overlaps" as="xs:boolean">
  <xsl:param name="first" as="element(e1)"/>
  <xsl:param name="second" as="element(e1)"/>
  <xsl:sequence select="number($second/@start) lt number($first/@end)"/>
</xsl:function>

Not tested!

This outputs a sequence of <pair> elements identifying the overlapping event
pairs, you can then do further processing on this list.

So, as you see, I think XSLT 2.0 is a very suitable language for solving
this kind of problem. I wouldn't attempt it in XSLT 1.0, however.

Michael Kay
http://www.saxonica.com/
 

> -----Original Message-----
> From: Rodrigo Segnini [mailto:rsegnini@xxxxxxxxx] 
> Sent: 18 July 2007 02:10
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] search for aligned elements
> 
> Hi
> 
> I have an xml document with the following structure, used for 
> annotation of events:
> 
> <annotation>
>    <body>
>       <track name="some event name">
>          <el index="0" start="38.4" end="39.08">
>          </el>
>          <el index="1" start="39.08" end="43.04">
>          </el>
>       </track>
>       <track name="other name">
>         <el index="0" start="0.04" end="4">
>         </el>
>       </track>
> 
>     (more tracks with more than one element)
> 
>   </body>
> </annotation>
> 
> I am interested in finding elements across tracks falling 
> within the same range as delimited by the start and end 
> fields. There are various rules to determine what means to be 
> within range or not.
> 
> When aligned elements are found, an attribute is added to a 
> third track linking the two.
> 
> Is xsl the right way to go for this procedure (speed is 
> crucial), or is parsing the document and traversing it from 
> another language a better option?
> 
> If the former is appropriate, could I get enough info through 
> this list to accomplish it, as a programmer but neophite to 
> xsl, or would anyone be interested in doing this work on a 
> contract basis? Also suggestions to the latter option are appreciated.
> 
> Thanks...
> 
> Rodrigo

Current Thread