Re: [xsl] sorting titles w stopwords but w/o value in every title node

Subject: Re: [xsl] sorting titles w stopwords but w/o value in every title node
From: "cking" <cking@xxxxxxxxxx>
Date: Thu, 2 Sep 2004 10:44:24 +0200
Susan,

> I'm sorry for the delay in responding.  A large tree fell on my house 
> about 1 AM Tuesday morning and I have been away from work finding
> a tree service and contractors, etc.  It's  quite a challenge.

wow, I can believe that... and I thought this stylesheet was quite a challenge!

I have been thinking about the sorting problem:

1. if a record doesn't have a title, we can look it up (by its doc-number),
    let's call it "found-title" 
2. the sort procedure should use the "found-title" rather than the actual title.
    no: actually it should use the "found-title-without-stopwords".
3. the output shows the actual title (empty, if it's empty)

Problem: can't use variables or if-constructs because xsl:sort must be first
child of xsl:for-each. The solution so far uses "actual-title-without-stopwords"
(can be empty) by means of the "Becker method" [1]

<xsl:sort select="concat(substring(substring-after(.,' '), 0 div boolean
    ($stop-words[starts-with(translate(current(), $uppercase, $lowercase), 
    concat(translate(., $uppercase, $lowercase), ' '))])), substring(., 0 div not
    ($stop-words[starts-with(translate(current(), $uppercase, $lowercase), 
    concat(translate(., $uppercase, $lowercase), ' '))])))"/>

I tried to put a "found-title" inside the xsl:sort select, but I couldn't make it work.

> The processor is Saxon but it's being called from within another application.
> I do not believe I can do a two-step process.

But Saxon does support exsl:node-set [2] so it should be possible to generate a 
temporary tree (pun not intended!!) and transform that in a second pass,
within one stylesheet. You could create a global variable with a structure
like

<sort-titles>
    <title doc-number="53690">american artist</title>
    <title doc-number="57769">american city &amp; country</title>
    <title doc-number="58345">american demographics</title>
    <title doc-number="58615">forbes.</title>
</sort-titles>

and then use

<xsl:sort select="exsl:node-set($sort-titles)/*[@doc-number=$doc-number]"/>

Using exsl:node-set also means that you don't need the "Becker hack" anymore, 
improving maintainability. Here's a stylesheet that sorts titles correctly:

<xsl:stylesheet version="1.0"
   xmlns="http://www.w3.org/1999/xhtml";
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
   xmlns:exsl="http://exslt.org/common";
   xmlns:sw="http://my.stopwords/sw";
   extension-element-prefixes="exsl sw"
   >

 <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"
  doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
  doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";
  />

 <sw:stop>
  <sw:word>the</sw:word>
  <sw:word>a</sw:word>
  <sw:word>an</sw:word>
 </sw:stop>

 <xsl:variable name="stop-words" select="document('')/xsl:stylesheet/sw:stop/sw:word"/>
 <xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
 <xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'"/>

 <xsl:variable name="sort-titles">
  <xsl:for-each select="//section-02">
   <xsl:if test="string(title)">
    <title doc-number="{doc-number}">
     <xsl:variable name="lower-title" select="translate(title, $uppercase, $lowercase)"/>
     <xsl:choose>
      <xsl:when test="$stop-words[starts-with($lower-title, concat(translate(., $uppercase, $lowercase), ' '))]">
       <xsl:value-of select="substring-after($lower-title,' ')"/>
      </xsl:when>
      <xsl:otherwise>
       <xsl:value-of select="$lower-title"/>
      </xsl:otherwise>
     </xsl:choose>
    </title>
   </xsl:if>
  </xsl:for-each>
 </xsl:variable>

 <xsl:template match="/">
  <html xmlns="http://www.w3.org/1999/xhtml";><head><title>sort without stop words</title></head><body>
   <table border="1">
    <tr>
     <th>doc-number</th>
     <th>title</th>
     <th>description</th>
     <th>arrival-date</th>
    </tr> 
    <xsl:for-each select="//section-02/title">
     <xsl:sort select="exsl:node-set($sort-titles)/*[@doc-number = current()/../doc-number]"/>
     <xsl:sort select="number(concat(substring(../arrival-date, 7,4),
      substring(../arrival-date, 1,2), substring(../arrival-date, 4,2)))"
      order="descending"/>
     <tr>
      <td><xsl:value-of select="../doc-number"/></td>
      <td><xsl:value-of select="."/></td>
      <td><xsl:value-of select="../description"/></td>
      <td><xsl:value-of select="../arrival-date"/></td>
     </tr>
    </xsl:for-each>
   </table>
  </body></html>
 </xsl:template>

</xsl:stylesheet>

Saxon 6.5.3 output:

    doc-number   title   description   arrival-date
    53690  American Artist  v.68:no.738(2004:Jan.)  02/26/2004
    57769         v.119:no.3(2004:Mar.)  03/25/2004
    57769  The American city & country  v.119:no.1(2004:Jan.)  02/11/2004
    58345         v.26:no.3(2004:Apr.)  04/12/2004
    58345         v.26:no.2(2004:Mar.)  03/06/2004
    58345  American demographics  v.26:no.1(2004:Feb.)  02/05/2004
    58615         v.173:no.5(2004:Mar.15)  03/15/2004
    58615         v.173:no.2(2004:Feb. 02)  01/21/2004
    58615  Forbes.  v.173:no.1(2004:Jan. 12)  01/12/2004

The records without a title are sorted in their correct position, now. 
One problem seems to remain: the titles tend to display in the last record, 
rather than the first, because the dates are sorted descending. But that
shouldn't be too difficult to solve.

I wish this will work in your application... and I wish you strength and 
all else you can use to solve the other tree challenge too!


Best regards
Anton Triest

[1] http://www.biglist.com/lists/xsl-list/archives/200008/msg00525.html
[2] http://exslt.org/exsl/functions/node-set/index.html

Current Thread