[xsl] regexs, grouping (?) and XSLT2?

Subject: [xsl] regexs, grouping (?) and XSLT2?
From: Bruce D'Arcus <bdarcus@xxxxxxxxxxxxx>
Date: Sat, 7 Aug 2004 15:43:28 -0400
I've got two problems I can't figure out.  I've decided to use XSLT2 to
do this, both because it seems more suited, and because I can use the
exercise to learn a bit about it.

Problem one, which is pretty easy I suppose, but not for me!  I have a
bunch of poorly marked-up XHTML documents that I need converted to
clean semantic code (in part to then generate citation code from).

I have paragraphs like:

<p>A "quote."</p>

I want the quotes converted to XHTML tags.  The following code doesn't
work, or did any other variation I tried:

<xsl:template match="xhtml:p">
  <p>
    <xsl:apply-templates mode="quotes"/>
  </p>
</xsl:template>

<xsl:template match="xhtml:p" mode="quotes">
  <xsl:analyze-string select="." regex='"(.*?)"'>
    <xsl:matching-substring>
      <q><xsl:value-of select="regex-group(1)"/></q>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

The second problem is more difficult, and is related to bibliographic
and citation formatting.

Among the biggest PITAes I came across trying to work out my own
stylesheets was figuring out how to format multiple works by the same
author.  E.g.,:

citations like (Doe, 1999a, 1999b) or (Smith and Jones, 2001b, 2001d),
where it represents two references to the same author-year.  BTW, coded
like so in new DocBook code:

<citation>
   <biblioref linkend="doe99-1"/>
   <biblioref linkend="doe99-2"/>
</citation>

In reference lists, you (optionally) get this (sequential em-dashes
replacing the creator name(s)):

Doe, John (1999a) ...
. (1999b) ...
. (1997) ...

With help from someone far more skilled than I, below was the (partial)
solution, which seems rather awkward.    (It seems I needed to do the
same mess of code for both the citations and bib lists, though that may
be my own ignorance).

So, the basic logic in author year citation is to say:

	look up author names (it gets more complicated if there are more than
one, of course) and year
	if same, then append a suffix and remove author from second

For bib lists, same applies, except that the mult-em-dashes only
replaces the creator name(s).

So, is there any XSLT2 magic that makes this easier?

<xsl:template match="mods:mods">
<!-- variables -->
  <xsl:variable name="id" select="@ID"/>
  <xsl:variable name="year"
		select="substring(descendant::mods:date|descendant::mods:
dateIssued,1,4)" />
  <xsl:variable name="first.author"
		select="mods:name[@type='personal' and
position()=1]/mods:namePart[@type='family']|
				mods:name[@type='corporate' and position()=1]/mods:namePart|
				mods:relatedItem[@type='host']/mods:
titleInfo[not(@type='abbreviated') and
not(ancestor::mods:mods/mods:name)]/mods:title"/>
  <xsl:variable name="refposition"
		select="1+count(preceding-sibling::mods:mods[mods:name[position()=1]/
mods:namePart[@type='family']=$first.author][substring(.//mods:
dateIssued|.//mods:date,1,4)=$year]|
				preceding-sibling::mods:mods[mods:name[@type='corporate' and
position()=1]/mods:namePart=$first.author][substring(.//mods:
dateIssued|.//mods:date,1,4)=$year]|
				preceding-sibling::mods:mods[mods:relatedItem[@type='host']/mods:
titleInfo[not(@type='abbreviated') and
not(ancestor::mods:mods/mods:name)]/mods:
title[position()=1]=$first.author][substring(.//mods:dateIssued|.//
mods:date,1,4)=$year])"/>
  <xsl:variable name="refposition.following"
		select="count(following-sibling::mods:mods[mods:name[position()=1]/
mods:namePart[@type='family']=$first.author][substring(.//mods:
dateIssued|.//mods:date,1,4)=$year]|
				following-sibling::mods:mods[mods:name[@type='corporate' and
position()=1]/mods:namePart=$first.author][substring(.//mods:
dateIssued|.//mods:date,1,4)=$year]|
				following-sibling::mods:mods[mods:relatedItem[@type='host']/mods:
titleInfo[not(@type='abbreviated') and
not(ancestor::mods:mods/mods:name)]/mods:
title=$first.author][substring(.//mods:dateIssued|.//mods:
date,1,4)=$year])"/>
  <xsl:message>
    <xsl:value-of select="concat($first.author,', ',$year,':
',$refposition,' ',$refposition.following)"/>
  </xsl:message>
  <xsl:variable name="suffix">
    <xsl:if test="$refposition+$refposition.following&gt;1">
      <xsl:value-of
select="substring('abcdefghijklmnopqrstuvwxyz',$refposition,1)"/>
    </xsl:if>
  </xsl:variable>

  <xsl:variable name="editor-number">
	<xsl:value-of select="mods:name/mods:role/mods:roleTerm='editor'"/>
  </xsl:variable>

  <p class="bibentry">
    <xsl:choose>
      <xsl:when test="mods:name">
    <span class="creator">
 	  <xsl:apply-templates select="mods:name"/>
	<xsl:if test="mods:name/mods:role/mods:roleTerm='editor'">
	  <xsl:choose>
	    <xsl:when test="count($editor-number)>0">
	      <xsl:text> (Eds.) </xsl:text>
	    </xsl:when>
	    <xsl:otherwise>
	      <xsl:text> (Ed.) </xsl:text>
	    </xsl:otherwise>
	  </xsl:choose>
	</xsl:if>
	</span>
      </xsl:when>
      <xsl:when
test="mods:relatedItem/descendant::mods:issuance='continuing'">
	    <xsl:value-of select="mods:relatedItem/mods:titleInfo/mods:title"/>
      </xsl:when>
    </xsl:choose>
    <xsl:text> (</xsl:text>
    <xsl:value-of select="concat($year,$suffix)"/>
    <xsl:text>) </xsl:text>
    <xsl:apply-templates
select="mods:titleInfo[not(@type='abbreviated')]"/>
    <xsl:apply-templates select="mods:originInfo"/>
    <xsl:apply-templates select="mods:relatedItem"/>
    <xsl:apply-templates select="mods:genre"/>
    <xsl:apply-templates select="mods:location/mods:physicalLocation"/>
    <xsl:apply-templates select="mods:location/mods:url"/>
    <xsl:text>.</xsl:text>
  </p>
</xsl:template>

Current Thread