Re: [xsl] search for aligned elements

Subject: Re: [xsl] search for aligned elements
From: "Rodrigo Segnini" <rsegnini@xxxxxxxxx>
Date: Thu, 19 Jul 2007 13:20:15 +0900
Thank you Michael for your provided solution.
You correctly identified one rule to determine alignment. I also want
to consider as aligned, cases of partial overlapping.
Not all comparisons among tracks are necessary; I overgeneralized here.
There is only one track, sparsely populated (that is, elements may not
be contiguous), which is compared against the rest, which when
populated, its elements are contiguous and of the same length. Let me
try to ascii-exemplify:

                          |----Main Track--------|
Case 0                    |--aligned--|
Case 1             |--aligned--|
Case 2                               |--aligned--|
Case 3  |--not aligned--||--aligned--|

two cases 0 OK: |--alig--||--alig--|

Case 0: overlap is complete
Cases 1 & 2: start || end out-of-bounds, yet main element covers > 50%
Case 3: not aligned because it covers < 50% and the one that follows
is a case 0, 1, or 2, or if neither, then the one overlapping the most
is selected (there is always an overlapping element).

Once an overlap is found, elements in any subsequent track (if any)
will have the same length, so they can be marked without having to
test again for alignment.

In this way the test takes place only once for each element in the main track.

Thanks again,

Rodrigo


On 7/19/07, Michael Kay <mike@xxxxxxxxxxxx> wrote:
You haven't really explained exactly what you mean by "aligned elements". My
guess is that you're looking for overlapping events: that is two events E,F
such that E/@start <= F/@end and E/@end >= F/@start (or vice versa). Clearly
a naive approach - examining all pairs of events - will have O(n^2)
performance, and it seems you want to do better than this. A better approach
is to create a sorted sequence of events and then for each event examine its
near neighbours to see if there's an overlapping event in a different track.
If you sort first by start time and then by end time, you can find the
events that overlap a given event E by searching forwards until you find an
event whose start time is later than the end time of E. (It might also have
been found as an overlapping event for some previous event). Because you
want to break out of the loop at this point (for efficiency) you're going to
have to use recursion. So in XSLT 2.0:

<xsl:template match="body">
 <xsl:variable name="sorted-events" as="element(e1)*">
   <xsl:perform-sort select="track/e1"/>
     <xsl:sort select="number(@start)"/>
     <xsl:sort select="number(@end)"/>
   </xsl:perform-sort>
 </xsl:variable>
 <xsl:variable name="overlapping-pairs" as="element(pair)">
   <xsl:for-each select="$sorted-events">
     <xsl:sequence select="f:find-overlapping-events(.,
                               subsequence($sorted-events,
position()+1))"/>
   </xsl:for-each>
 </xsl:variable>
 <!-- now process the pairs. For now, just output them -->
 <xsl:copy-of select="$overlapping-pairs"/>
</xsl:template>

<!-- Find the events in $others that overlap a given event $this -->

<xsl:function name="f:find-overlapping-events">
 <xsl:param name="this" as="element(e1)"/>
 <xsl:param name="others" as="element(e1)*"/>
 <xsl:if test="exists($others) and f:overlaps($this, $others[1])">
       <pair track1="{$this/../@name}"
             event1="{$this/some_event_id}"
             track2="{$other/../@name}"
             event2="{$other/some_event_id}"/>
       <xsl:sequence select="f:find-overlapping-events($this,
subsequence($others, 2))"/>
 </xsl:if>
</xsl:function>

<!-- Test whether $first overlaps $second

<xsl:function name="f:overlaps" as="xs:boolean">
 <xsl:param name="first" as="element(e1)"/>
 <xsl:param name="second" as="element(e1)"/>
 <xsl:sequence select="number($second/@start) lt number($first/@end)"/>
</xsl:function>

Not tested!

This outputs a sequence of <pair> elements identifying the overlapping event
pairs, you can then do further processing on this list.

So, as you see, I think XSLT 2.0 is a very suitable language for solving
this kind of problem. I wouldn't attempt it in XSLT 1.0, however.

Michael Kay
http://www.saxonica.com/


> -----Original Message----- > From: Rodrigo Segnini [mailto:rsegnini@xxxxxxxxx] > Sent: 18 July 2007 02:10 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] search for aligned elements > > Hi > > I have an xml document with the following structure, used for > annotation of events: > > <annotation> > <body> > <track name="some event name"> > <el index="0" start="38.4" end="39.08"> > </el> > <el index="1" start="39.08" end="43.04"> > </el> > </track> > <track name="other name"> > <el index="0" start="0.04" end="4"> > </el> > </track> > > (more tracks with more than one element) > > </body> > </annotation> > > I am interested in finding elements across tracks falling > within the same range as delimited by the start and end > fields. There are various rules to determine what means to be > within range or not. > > When aligned elements are found, an attribute is added to a > third track linking the two. > > Is xsl the right way to go for this procedure (speed is > crucial), or is parsing the document and traversing it from > another language a better option? > > If the former is appropriate, could I get enough info through > this list to accomplish it, as a programmer but neophite to > xsl, or would anyone be interested in doing this work on a > contract basis? Also suggestions to the latter option are appreciated. > > Thanks... > > Rodrigo

Current Thread