Subject: Re: How can I filter stoppwords From: Jeni Tennison <mail@xxxxxxxxxxxxxxxx> Date: Sat, 02 Sep 2000 09:24:29 +0100 |
Barbara, >Does anybody know another way to filter stopp words? I'm not sure, but I think you were only after filtering stop words that start the name of the book? Adapting Eric's solution: The xsl:stylesheet element declares the necessaries, and the additional namespace 'sw' that is used for the internal data (the list of stop words). To prevent this namespace being declared on your output, use 'exclude-result-prefixes': <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sw="mailto:vdv@xxxxxxxxxxxx" exclude-result-prefixes="sw"> ... </xsl:stylesheet> Then the declaration of the stop words that you want to filter out. I've put these in a variable so that they can be accessed easily: <sw:stop> <word>the</word> <word>a</word> <word>is</word> </sw:stop> <xsl:variable name="stop-words" select="document('')/xsl:stylesheet/sw:stop/word" /> Declaration of two variables so that we can translate between upper and lower case fairly easily: <xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" /> <xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" /> Now the template. I've only used one for brevity, but of course you can split it down into several through calling and applying templates. Within this template, I iterate through each of the titles. For each title, I find all the stop words such that the current title starts with that stop word (plus a space, and all ignoring case). If there is such a match, then the title is substring()ed to give the resulting title by taking off the characters that make up the word it begins with. <xsl:template match="/"> <result> <xsl:for-each select="xmlfile/book/title"> <before><xsl:value-of select="." /></before> <xsl:variable name="begins-with" select="$stop-words[starts-with(translate(current(), $uppercase, $lowercase), concat(translate(., $uppercase, $lowercase), ' '))]" /> <after> <xsl:choose> <xsl:when test="$begins-with"> <xsl:value-of select="substring(., string-length($begins-with) + 2)" /> </xsl:when> <xsl:otherwise> <xsl:value-of select="." /> </xsl:otherwise> </xsl:choose> </after> </xsl:for-each> </result> </xsl:template> This strips leading stop words in SAXON and MSXML (July). It works in Xalan-C++ v.0.40.0 except for the exclude-result-prefixes thing, which is ignored. However... >How do you XSL-create a sort criterion? ...you can't (at the moment) use a template to create a string to use as a sort criterion. Sort criteria have to be XPath select expressions. This problem will go away when (a) you can convert RTFs to node sets and/or (b) when you can use something like saxon:function to declare extension functions within XSLT. For the meantime, then you have to use something really horrible like: <xsl:template match="/"> <result> <xsl:for-each select="xmlfile/book/title"> <xsl:sort select="concat(substring(substring-after(., ' '), 0 div boolean($stop-words[starts-with(translate(current(), $uppercase, $lowercase), concat(translate(., $uppercase, $lowercase), ' '))])), substring(., 0 div not($stop-words[starts-with(translate(current(), $uppercase, $lowercase), concat(translate(., $uppercase, $lowercase), ' '))])))" /> <title><xsl:value-of select="." /></title> </xsl:for-each> </result> </xsl:template> (Honestly, it doesn't look that much clearer even when it *is* indented ;) This works in SAXON, MSXML (July) and Xalan (with the exception of the result-prefixes thing). I hope that helps, Jeni Jeni Tennison http://www.jenitennison.com/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Embedding html in xml problem, Jeni Tennison | Thread | real time transformations, Lawrence Pit |
Re: Having more then one output fil, Jeni Tennison | Date | real time transformations, Lawrence Pit |
Month |