[xsl] Collating riffled lists

Subject: [xsl] Collating riffled lists
From: "Mat Myszewski" <mmyszew@xxxxxxxxxxx>
Date: Mon, 29 Sep 2003 13:42:56 -0400
This is my first XSLT project. I have a recursive solution to a problem
which I hope one of you can improve on.

This an abstraction of a problem that arose in the context of scraping  PDF

PDF->Adobe->HTML->tidy->XML->scrape with XSLT->...

The PDF->HTML conversion, or for that matter, lassoing the text in Acrobat
Reader, cutting and pasting it, yields a different order than what is
displayed on screen by Acrobat Reader. It's not so badly mangled that it
can't be recovered. However, related items are no longer near one another. I
need to recover the original relationship between the related items.

I'm hoping someone can come up with a better solution that the one I present
below, which I believe is O(n squared), where n is large (the original
document is 170+ pages). I've considered outputting the a's and b's into two
result files with two xslt programs and processing those. I think XSLT 1.1
would allow this to be done within a single xslt program by building two
node sets, but I'd like to stick to 1.0, if possible.

XML source:

<?xml version="1.0"?>


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";

<!-- match first a and make top level call to recursive template -->
<xsl:template match="a[1]">
    <xsl:call-template name="do_a">
        <xsl:with-param name="ix" select="1" />

<!-- recursive template counts a's -->
<xsl:template name="do_a">
    <xsl:param name="ix" />

    <!-- output this a -->
    <xsl:value-of select="$ix" /><xsl:text>: </xsl:text><xsl:value-of

    <!-- output corresponding b -->
    <xsl:text> </xsl:text><xsl:copy-of select="/list/b[$ix]/text()" />

    <!-- This for-each moves to the next a; doesn't loop. -->
    <xsl:for-each select="following-sibling::a[1]">

        <!-- increment counter and output rest of a's -->
        <xsl:call-template name="do_a">
            <xsl:with-param name="ix" select="$ix+1" />


<!-- suppress other output -->
<xsl:template match="text()" />


And this is the output (Saxon 6.5.1 with XFactor GUI):

<?xml version="1.0" encoding="utf-8"?>
1: a1 b1
2: a2 b2
3: a3 b3
4: a4 b4
5: a5 b5
6: a6 b6


            Mat M.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread