Subject: Re: [xsl] sorting a list of titles after removal of stopwords and special characters From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx> Date: Tue, 11 Dec 2001 17:09:11 +0000 |
Trevor Nash wrote: > What you need is an expression that, given the context of a title > element, will return a string containing the edited title (stop words > removed). This cannot be done with standard XSLT, but you have three > possibilities: Actually, it's not *impossible* with standard XSLT, although admittedly it isn't pretty. Assuming that $punctuation is a string holding the ignorable punctuation characters and that the list of stopwords were sorted such that 'an' comes before 'a' rather than after it, you could use: concat( substring( substring(translate(title, $punctuation, ''), string-length( $stoplist[starts-with( translate(current()/title, concat($lowercase, $punctuation), $uppercase), translate(., $lowercase, $uppercase))]) + 2), 1 div boolean($stoplist[starts-with( translate(current()/title, concat($lowercase, $punctuation), $uppercase), translate(., $lowercase, $uppercase))])), substring( translate(title, $punctuation, ''), 1 div not($stoplist[starts-with( translate(current()/title, concat($lowercase, $punctuation), $uppercase), translate(., $lowercase, $uppercase))]))) If we were using XPath 2.0, assuming an if statement similar to that in XQuery, it would look something like: if ($stoplist[starts-with( translate(current()/title, concat($lowercase, $punctuation), $uppercase), translate(., $lowercase, $uppercase))]) then substring(translate(title, $punctuation, ''), string-length( $stoplist[starts-with( translate(current()/title, concat($lowercase, $punctuation), $uppercase), translate(., $lowercase, $uppercase))]) + 2) else translate(title, $punctuation) which isn't that much more pleasant. If the stop words were stored with a space, as: <ignore>the </ignore> <ignore>an </ignore> <ignore>a </ignore> (which would probably a good idea anyway, given that quite a few titles might begin with the letter 'A') then you could use simply: substring(translate(title, $punctuation, ''), string-length( $stoplist[starts-with( translate(current()/title, concat($lowercase, $punctuation), $uppercase), translate(., $lowercase, $uppercase))]) + 1) > 1) You are using Saxon, which has an extension saxon:function > which lets you write a function in XSLT - more or less the > contents of your mode="with-stoplist" template. Just to mention, you can also use func:function from the EXSLT namespace http://exslt.org/functions in Saxon, 4XSLT, jd.xslt and libxslt to achieve this. It's more portable to use func:function than to use saxon:function (because it's available in those other processors), but they do basically the same thing. See http://www.exslt.org/func for details. Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] sorting a list of titles , Trevor Nash | Thread | RE: [xsl] Design culture (Was: Desi, Joshua . Kuswadi |
[xsl] RE: [xsl] RE: [xsl] Re: [xsl], Brinkman, Theodore | Date | RE: [xsl] Dumb questions from a new, Mike Ferrando |
Month |