[xsl] sorting a list of titles after removal of stopwords and special characters

Subject: [xsl] sorting a list of titles after removal of stopwords and special characters
From: bpytlikz@xxxxxxxxxxxxxxxx
Date: Mon, 10 Dec 2001 17:09:53 -0600
Dear Colleagues,

I am trying to sort a list of titles that have been processed using XSLT to
remove all leading articles (stopwords "A", "An", and "The"), and to remove
special characters such as  [,  ], ^, and so on.

So far, I am unable to sort the list of titles correctly and would
appreciate whatever help you may provide. Please see relevant files below.


Brian L. Pytlik Zillig
Digital Initiatives Librarian
University of Nebraska-Lincoln Libraries

Here is my XML file, "lead.xml":

<?xml version="1.0" encoding="utf-8"?>

<?xml-stylesheet href="leadingstopwords.xsl" type="text/xsl"?>

<!-- Correct sorted order should be:
The American Way
A Better Way
An Evil Day
Xerxes Unchained: A Memoir
The Yanks Are Coming!
Zeitgeist as Poltergeist
A Zoo Story

   <title>^A Zoo Story^</title>

   <title>[The Yanks Are Coming!]</title>

   <title>Zeitgeist as Poltergeist</title>

   <title>The American Way</title>

   <title>A Better Way</title>

   <title>An Evil Day</title>

   <title>Xerxes Unchained: A Memoir</title>

And here is my XSL file, "lead.xsl":

<?xml version='1.0'?>

<xsl:stylesheet version="1.0"




<xsl:template match="/">


<B>Sorted: </B><P />
<xsl:apply-templates select="//ead/book/title" mode="with-stoplist">
<xsl:sort select="$stoplist" order="descending"/>



<!-- Stoplist template to DROP initial articles "A", "An" and "The" in
title, and to remove special characters, including square brackets "[" and
so on -->
<xsl:template match="//ead/book/title" mode="with-stoplist">
<xsl:variable name="begins-with" select
="$stoplist[starts-with(translate(current(), $uppercase,
$lowercase), concat(translate(., $uppercase, $lowercase), ' '))]" />
<xsl:value-of select="translate(substring(., string-length($begins-with) +
1),'[/]-=@#$%^()','')" />
<P />

<!-- Declares variables for sorting -->
<xsl:variable name="stoplist" select="document
('')/xsl:stylesheet/stop:stoplist/ignore" />
<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />


And here is my output file, created using Instant Saxon, "lead.html":

   <body><B>Sorted: </B><P></P>A Zoo Story
      <P></P>The Yanks Are Coming!
      <P></P>Zeitgeist as Poltergeist
      <P></P> American Way
      <P></P> Better Way
      <P></P> Evil Day
      <P></P>Xerxes Unchained: A Memoir

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread