Re: [xsl] sorting a list of titles after removal of stopwords and special characters

Subject: Re: [xsl] sorting a list of titles after removal of stopwords and special characters
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Tue, 11 Dec 2001 17:09:11 +0000
Trevor Nash wrote:
> What you need is an expression that, given the context of a title
> element, will return a string containing the edited title (stop words
> removed).  This cannot be done with standard XSLT, but you have three
> possibilities:

Actually, it's not *impossible* with standard XSLT, although
admittedly it isn't pretty. Assuming that $punctuation is a string
holding the ignorable punctuation characters and that the list of
stopwords were sorted such that 'an' comes before 'a' rather than
after it, you could use:

 concat(
  substring(
   substring(translate(title, $punctuation, ''),
             string-length(
              $stoplist[starts-with(
                         translate(current()/title,
                                   concat($lowercase, $punctuation),
                                   $uppercase),
                         translate(., $lowercase, $uppercase))]) + 2),
   1 div boolean($stoplist[starts-with(
                            translate(current()/title,
                                      concat($lowercase, $punctuation),
                                      $uppercase),
                            translate(., $lowercase, $uppercase))])),
  substring(
   translate(title, $punctuation, ''),
   1 div not($stoplist[starts-with(
                        translate(current()/title,
                                  concat($lowercase, $punctuation),
                                  $uppercase),
                        translate(., $lowercase, $uppercase))])))

If we were using XPath 2.0, assuming an if statement similar to
that in XQuery, it would look something like:

  if ($stoplist[starts-with(
                 translate(current()/title,
                           concat($lowercase, $punctuation),
                           $uppercase),
                 translate(., $lowercase, $uppercase))])
  then substring(translate(title, $punctuation, ''),
                 string-length(
                   $stoplist[starts-with(
                             translate(current()/title,
                                       concat($lowercase, $punctuation),
                                       $uppercase),
                             translate(., $lowercase, $uppercase))]) + 2)
  else translate(title, $punctuation)

which isn't that much more pleasant.

If the stop words were stored with a space, as:

  <ignore>the </ignore>
  <ignore>an </ignore>
  <ignore>a </ignore>

(which would probably a good idea anyway, given that quite a few
titles might begin with the letter 'A') then you could use simply:

 substring(translate(title, $punctuation, ''),
           string-length(
             $stoplist[starts-with(
                        translate(current()/title,
                                  concat($lowercase, $punctuation),
                                  $uppercase),
                        translate(., $lowercase, $uppercase))]) + 1)
                        
>    1) You are using Saxon, which has an extension saxon:function
>     which lets you write a function in XSLT - more or less the
>     contents of your mode="with-stoplist" template.

Just to mention, you can also use func:function from the EXSLT
namespace http://exslt.org/functions in Saxon, 4XSLT, jd.xslt and
libxslt to achieve this. It's more portable to use func:function than
to use saxon:function (because it's available in those other
processors), but they do basically the same thing. See
http://www.exslt.org/func for details.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread