Re: [xsl] regexs, grouping (?) and XSLT2?

Subject: Re: [xsl] regexs, grouping (?) and XSLT2?
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Tue, 10 Aug 2004 14:25:07 +0100
Hi Bruce,

> This is exactly the problem that prompted my other post on xs:date.
> I can't use either xs:gYear or xs;date because some dates are in the
> form YYYY and some are YYYY-MM, and still others YYYY-MM-DD. So, I
> had to change to substring(.,1,4) to get the stylesheet to compile.
>
> I really wish xs:date would count all of these as valid.

Yes, well, requests for changes to the XML Schema type hierarchy
aren't at all rare.

You can, of course, define a datatype that accepts xs:date
(YYYY-MM-DD), xs:gYearMonth (YYYY-MM) and xs:gYear (YYYY) formats in a
schema such as:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
           targetNamespace="http://www.loc.gov/mods/v3";>

<xs:simpleType name="date">
  <xs:union memberTypes="xs:date xs:gYearMonth xs:gYear" />
</xs:simpleType>

</xs:schema>

Say this is mods.xsd. In a schema-aware processor (such as Saxon
8.0SA), you can import this schema with:

<xsl:import-schema namespace="http://www.loc.gov/mods/v3";
                   schema-location="mods.xsd" />

But this won't really help you with your grouping problem at all.

For one thing, you can't cast to a union type, so you can't do
something like:

  <xsl:variable name="date" as="mods:date"
    select="mods:originInfo/mods:dateIssued" />

If you wanted to have the processor assign the correct type to the
<mods:dateIssued> element, you'd have to make a copy of it with the
correct type assignment, and then get its typed value, as in:

  <xsl:variable name="temp" as="element(mods:dateIssued, mods:date)">
    <mods:dateIssued xsl:type="mods:date">
      <xsl:value-of select="mods:originInfo/mods:dateIssued" />
    </mods:dateIssued>
  </xsl:variable>
  <xsl:variable name="date" as="xdt:anyAtomicType"
                select="data($temp)" />

(Or, of course, you could create a schema for the entire document and
make sure the source is validated against that schema, so that the
<mods:dateIssued> element is annotated with the correct type from the
start.)

Second, having the processor assign the correct type doesn't really
buy you anything anyway, because there's precious little support for
the xs:gHorribleKludge datatypes in XPath 2.0. If you were
constructing a function to group the <mods> elements by year, it would
look something like:

<xsl:function name="mods:year" as="xs:integer">
  <xsl:param name="mods" as="element(mods:mods)" />
  <xsl:variable name="temp" as="element(*, mods:date)">
    <mods:dateIssued xsl:type="mods:date">
      <xsl:value-of select="$mods/mods:originInfo/mods:dateIssued" />
    </mods:dateIssued>
  </xsl:variable>
  <xsl:variable name="date" as="xdt:anyAtomicType" select="data($temp)" />
  <xsl:choose>
    <xsl:when test="$date instance of xs:date">
      <xsl:sequence select="year-from-date($date)" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="xs:integer(substring(string($date), 1, 4))" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

A lighter-weight version would be the following, which just tests
whether the date can be cast to a xs:date, and if so does the cast and
uses the year-from-date() function.

<xsl:function name="mods:year" as="xs:integer">
  <xsl:param name="mods" as="element(mods:mods)" />
  <xsl:variable name="date" as="element(mods:dateIssued)"
    select="$mods/mods:originInfo/mods:dateIssued" />
  <xsl:choose>
    <xsl:when test="$date castable as xs:date">
      <xsl:sequence select="year-from-date(xs:date($date))" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="xs:integer(substring($date, 1, 4))" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

Contrast this with avoiding the datatyping all together:

<xsl:function name="mods:year" as="xs:integer">
  <xsl:param name="mods" as="element(mods:mods)" />
  <xsl:sequence select="xs:integer(substring($mods/mods:originInfo/mods:dateIssued, 1, 4))" />
</xsl:function>

This function is so simple, you don't even need it to be a function;
you can just put the value of the select attribute of the above
<xsl:sequence> into the group-by attribute and be done.

Another point to be made is that if you have a union type, there's no
way to compare the values within that type with each other: you can't
compare a xs:date with a xs:gYear, so you can't sort them into the
order that you'd expect.

[FWIW, I thought that the schema-aware version would turn out to be
simple, since Mike's been going on about how much easier life is with
schema-awareness; I'm surprised at how complicated it turns out to be,
and it's possible that I'm missing some easier schema-aware method.]

> My other issue is how to use your stylesheets to get each reference
> -- instead of each group -- wrapped in proper tags.

If you generate the element within the <xsl:for-each-group> then
you'll get it wrapped around each group. If you generate the element
within the <xsl:for-each> (which is iterating over individual <mods>
elements) then you'll get it wrapped around each reference. If you
want the names to appear within the first reference rather than at the
group level, then move it into the <xsl:for-each> and use a position()
= 1 check to tell whether you're in the first reference for the group.
(This is complicated a little because you're using two levels of
grouping.)

Something like:

<xsl:template match="mods:modsCollection">
  <xsl:variable name="mods" select="mods:mods" />
  <xsl:for-each-group select="$mods" group-by="mods:grouping-key(.)">
    <xsl:sort select="current-grouping-key()"/>
    <xsl:for-each-group select="current-group()"
      group-by="xs:integer(substring(mods:originInfo/mods:dateIssued,1,4))">
      <xsl:sort select="current-grouping-key()" />
      <xsl:variable name="year" as="xs:integer"
                    select="current-grouping-key()" />
      <xsl:variable name="first" as="xs:boolean"
                    select="position() = 1" />
      <xsl:for-each select="current-group()">
        <p>
          <xsl:choose>
            <xsl:when test="$first and position() = 1">
              <xsl:apply-templates select="." mode="names" />
            </xsl:when>
            <xsl:otherwise>----. </xsl:otherwise>
          </xsl:choose>
          <span class="year">
            <xsl:value-of select="$year" />
            <xsl:if test="last() > 1">
              <xsl:number value="position()" format="a" />
            </xsl:if>
          </span>
          ...
        </p>
      </xsl:for-each>
    </xsl:for-each-group>
  </xsl:for-each-group>
</xsl:template>

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/

Current Thread