Subject: Re: [xsl] text() word lists From: Dimitre Novatchev <dnovatchev@xxxxxxxxx> Date: Fri, 6 Feb 2004 08:38:18 -0800 (PST) |
> Hi there, > > I'm sure this is a faq, and I've checked the faq and archive. > I swear I remember someone asking about it, but I couldn't > find it, so here goes. > > I want to take an XML file of unknown elements and create > a word frequency list / word list. Now, an entry on sorting > in the xslt faq says this is just what xslt is bad at. (And > I'm sure there are some that would say 'just go use perl', > but let's say I want to do it in xslt(1 or 2). > > XSLT2 makes the tokenization of strings much easier, so > assuming I'm using that, if I have: > > <foo> > <blort> This is a <wibble>Test</wibble>, only a test!</blort> > <blort> This really is a <wibble>great big test</wibble>, only a test! > </blort> > </foo> > > I don't know that foo|wibble|blort will be the element names. > > But I want to produce both: > > a -- 4 > test -- 4 > only -- 2 > is -- 2 > this -- 2 > big -- 1 > great -- 1 > really -- 1 > > Which (unless I've missed something) should be > a case-insensitive list grouped by frequency > sorted alphabetically within this, and ignoring > punctuation. > > But also: > > a -- 4 > big -- 1 > great -- 1 > is -- 2 > only -- 2 > test -- 4 > this -- 2 > really -- 1 > > Which is the same list by not grouped > by frequency. > > Suggestions? Solutions? > > Many thanks for any help, > -James > --- > Dr James Cummings, Oxford Text Archive, University of Oxford > James.Cummings at ota.ahds.ac.uk http://users.ox.ac.uk/~jamesc/ Using FXSL and Saxon 7 (This was intended to be essentially an XSLT 1.0 solution, until I realized that there cannot be references to variables in xsl:key -- I need to change this a little bit to work in XSLT 1.0) one would write: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ext="http://exslt.org/common" > <xsl:import href="strSplit-to-Words.xsl"/> <xsl:key name="kWordByVal" match="word" use="translate(., $vUpper, $vLower)"/> <xsl:variable name="vUpper" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/> <xsl:variable name="vLower" select="'abcdefghijklmnopqrstuvwxyz'"/> <xsl:output indent="yes" omit-xml-declaration="yes"/> <xsl:template match="/"> <xsl:variable name="vwordNodes"> <xsl:call-template name="str-split-to-words"> <xsl:with-param name="pStr" select="/"/> <xsl:with-param name="pDelimiters" select="', 	 !'"/> </xsl:call-template> </xsl:variable> <xsl:for-each select="ext:node-set($vwordNodes)/*[normalize-space()] [generate-id() = generate-id(key('kWordByVal', translate(., $vUpper, $vLower) )[1]) ]"> <xsl:sort select="count(key('kWordByVal', translate(., $vUpper, $vLower) ) )" data-type="number" order="descending" /> <xsl:value-of select="concat('
', translate(., $vUpper, $vLower), ' - ', count(key('kWordByVal', translate(., $vUpper, $vLower) ) ) )"/> </xsl:for-each> </xsl:template> </xsl:stylesheet> When this transformation is applied on your source.xml: <foo> <blort> This is a <wibble>Test</wibble>, only a test!</blort> <blort> This really is a <wibble>great big test</wibble>, only a test! </blort> </foo> The wanted result is produced: a - 4 test - 4 this - 2 is - 2 only - 2 really - 1 great - 1 big - 1 For the other output you just have to change the "select" attribute of xsl:sort. Solving this kind of tasks is almost trivial using FXSL. Cheers, Dimitre Novatchev FXSL developer, http://fxsl.sourceforge.net/ -- the home of FXSL Resume: http://fxsl.sf.net/DNovatchev/Resume/Res.html __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] text() word lists, David Carlisle | Thread | RE: [xsl] text() word lists, Michael Kay |
Re: [xsl] Pattern Matching a sting , scott gabelhart | Date | Re: [xsl] template matching, David Carlisle |
Month |