Re: [xsl] regexs, grouping (?) and XSLT2?

Subject: Re: [xsl] regexs, grouping (?) and XSLT2?
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Tue, 10 Aug 2004 09:45:36 +0100
Hi Bruce,

I've spent a bit of time on this since it uses some interesting XSLT
2.0 features.

> So, the para output would be something like:
>
> <p>Some citations: (Doe, 1999a, 1999b; Doe and Jones, 1999)</p>
>
> And in the bib list:
>
> Doe (1999a)
> . (1999b)
> Doe and Jones (1999)

These are both examples of the same grouping logic, just with
different output as a result.

There are actually two levels to the grouping.

First, you need to group the <mods> elements based on the named
authors -- all the named authors, since you want papers by "John Doe"
to be grouped separately from papers by "John Doe and Jane Jones".
(And presumably you want papers by "John Doe" to be grouped separately
from those by "James Doe", though I'm not sure how the output would
show that?).

Second, you need to group the <mods> elements for a particular author
by year, so that you can tell when you need to add the "a", "b", "c"
etc. after the year.

The first level of grouping needs a compound grouping key -- a
grouping key where the value is constructed from multiple elements or
attributes in your data. Grouping keys in XSLT 2.0 have to be atomic
values (usually strings), so a compound grouping key means that you
have to construct a string that's different for each group you want to
identify. For example, you might use a string that looks like:

  "Doe,John"

for papers written by John Doe by himself, and:

  "Doe,John;Jones,Jane"

for papers written by John Doe and Jane Jones.

When you have a compound grouping key in XSLT 2.0, it's a good idea to
write a function to construct the grouping key, because the code that
does it is likely to be fairly complex, and functions allow you to use
XSLT to construct the grouping key rather than being limited to XPath.

In this case, you might write a function like the following, which
constructs strings in the formats given above based on a particular
<mods> element:

<xsl:function name="mods:grouping-key" as="xs:string">
  <xsl:param name="mods" as="element(mods:mods)" />
  <xsl:value-of separator=";">
    <xsl:for-each select="$mods/mods:name">
      <xsl:value-of
        select="string-join((mods:namePart[@type = 'family'],
                             mods:namePart[@type = 'given']), ',')" />
    </xsl:for-each>
  </xsl:value-of>
</xsl:function>

[Note the two new methods of constructing strings from lists with
particular separators in XSLT 2.0: string-join() and using the content
of <xsl:value-of> alongside the separator attribute.]

Now you've got a grouping key sorted out, you can do the first level
of grouping using <xsl:for-each-group>. Use either group-by (if the
<mods> elements aren't sorted so that <mods> that should appear in the
same group are next to each other) or group-adjacent (if the <mods>
elements are sorted in that way):

  <xsl:for-each-group select="$mods"
                      group-by="mods:grouping-key(.)">
    <xsl:sort select="current-grouping-key()" />
    ...
  </xsl:for-each-group>

This is the level at which you want to create the name(s) of the
author(s) of the <mods>.

Within this level of grouping, you need another <xsl:for-each-group>
to group by the year. The <mods> elements to be grouped are those in
the current group (of <mods> elements by the same author(s)):

  <xsl:for-each-group select="$mods" group-by="mods:grouping-key(.)">
    <xsl:sort select="current-grouping-key()"/>
    ...
    <xsl:for-each-group select="current-group()"
                        group-by="xs:gYear(mods:originInfo/mods:dateIssued)">
      <xsl:sort select="current-grouping-key()" />
      ...
    </xsl:for-each-group>
  </xsl:for-each-group>
</xsl:template>

[Note that this shouldn't be allowed by a Basic XSLT 2.0 processor
because such processors shouldn't support the xs:gYear datatype. I've
tested this with Saxon 8.0B, and it works OK there, but that might
change when Mike makes it conformant. You could use xs:integer()
rather than xs:gYear() here to make it work with conformant Basic XSLT
2.0 processors.]

I don't think you want to generate any output at this level, but you
then want to iterate over the members of the current group (which are
the <mods> elements with the same author and the same date) to create
output for each <mods> element:

  <xsl:for-each-group select="$mods" group-by="mods:grouping-key(.)">
    <xsl:sort select="current-grouping-key()"/>
    ...
    <xsl:for-each-group select="current-group()"
                        group-by="xs:gYear(mods:originInfo/mods:dateIssued)">
      <xsl:sort select="current-grouping-key()" />
      <xsl:for-each select="current-group()">
        ...
      </xsl:for-each>
    </xsl:for-each-group>
  </xsl:for-each-group>

Within this output, you'll want to include the year itself, and a
letter if there's more than one <mods> element in the group. You can
test whether there's more than one <mods> element in the group by
checking the value of last() within the <xsl:for-each> -- if it's more
than 1 then there's more than one element in the group. You can then
create the letter using <xsl:number> with the value attribute
selecting the position of the current <mods> element within the group:

  <xsl:for-each-group select="$mods" group-by="mods:grouping-key(.)">
    <xsl:sort select="current-grouping-key()"/>
    ...
    <xsl:for-each-group select="current-group()"
                        group-by="xs:gYear(mods:originInfo/mods:dateIssued)">
      <xsl:sort select="current-grouping-key()" />
      <xsl:for-each select="current-group()">
        ...
        <xsl:value-of select="mods:originInfo/mods:dateIssued" />
        <xsl:if test="last() > 1">
          <xsl:number value="position()" format="a" />
        </xsl:if>
        ...
      </xsl:for-each>
    </xsl:for-each-group>
  </xsl:for-each-group>

That's the basic pattern for both the citations and the bibliography.
Here are the templates for each; I won't go into detail explaining
them since it's mostly obvious. Note that I use a key to access the
<mods> elements associated with a particular <biblioref> within a
<citation>. Also note that the fact that there are these two levels of
grouping makes things slightly harder from the perspective of knowing
where to put separators and so on.

<xsl:key name="mods" match="mods:mods" use="@ID" />

<xsl:template match="db:citation">
  <xsl:text>(</xsl:text>
  <xsl:variable name="mods" select="key('mods', db:biblioref/@linkend)" />
  <xsl:for-each-group select="$mods" group-by="mods:grouping-key(.)">
    <xsl:sort select="current-grouping-key()"/>
    <xsl:apply-templates select="." mode="names" />
    <xsl:text>, </xsl:text>
    <xsl:for-each-group select="current-group()"
                        group-by="xs:gYear(mods:originInfo/mods:dateIssued)">
      <xsl:sort select="current-grouping-key()" />
      <xsl:for-each select="current-group()">
        <xsl:value-of select="mods:originInfo/mods:dateIssued" />
        <xsl:if test="last() > 1">
          <xsl:number value="position()" format="a" />
          <xsl:if test="position() != last()">, </xsl:if>
        </xsl:if>
      </xsl:for-each>
      <xsl:if test="position() != last()">, </xsl:if>
    </xsl:for-each-group>
    <xsl:if test="position() != last()">; </xsl:if>
  </xsl:for-each-group>
  <xsl:text>)</xsl:text>
</xsl:template>

<xsl:template match="mods:modsCollection">
  <xsl:variable name="mods" select="mods:mods" />
  <xsl:for-each-group select="$mods" group-by="mods:grouping-key(.)">
    <xsl:sort select="current-grouping-key()"/>
    <xsl:apply-templates select="." mode="names" />
    <xsl:text> </xsl:text>
    <xsl:for-each-group select="current-group()"
                        group-by="xs:gYear(mods:originInfo/mods:dateIssued)">
      <xsl:sort select="current-grouping-key()" />
      <xsl:variable name="first" as="xs:boolean" select="position() = 1" />
      <xsl:for-each select="current-group()">
        <xsl:if test="not($first and position() = 1)">-----. </xsl:if>
        <xsl:value-of select="mods:originInfo/mods:dateIssued" />
        <xsl:if test="last() > 1">
          <xsl:number value="position()" format="a" />
        </xsl:if>
        <xsl:text>&#xA;</xsl:text>
      </xsl:for-each>
    </xsl:for-each-group>
  </xsl:for-each-group>
</xsl:template>

<xsl:template match="mods:mods" mode="names">
  <xsl:for-each select="mods:name">
    <xsl:value-of select="mods:namePart[@type = 'family']" />
    <xsl:choose>
      <xsl:when test="position() = last()" />
      <xsl:when test="position() = last() - 1"> and </xsl:when>
      <xsl:otherwise>, </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each>
</xsl:template>

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/

Current Thread