Subject: RE: [xsl] Re: text() word lists From: James Cummings <James.Cummings@xxxxxxxxxxxxxx> Date: Mon, 9 Feb 2004 10:21:00 +0000 (GMT) |
On Mon, 9 Feb 2004 David.Pawson@xxxxxxxxxxx wrote: > I said: > Is it possible to remove all numbers too? > Or is that a part of the lexicographers toolset? It can be (I'm reliably informed by a linguist sitting a few desks away), in that someone might be analysing the text of (say) a motoring magazine. "The A1-M1 link road" (for UK readers) or "a V6 Engine...or I could have had a V8". where any comparisons don't make sense without the numbers. So what is the best way to parameterise these to allow turning on/off the removal of numbers? And while we're at it, turning on/off the removal of hyphens or other possibly-word-forming characters? > <xsl:template match="/"> > <frequencies> > <xsl:for-each-group group-by="." select=" > for $w in tokenize(string(.), '[\s.?!,)(]+')[.] return lower-case($w)"> > <xsl:sort select="count(current-group())" order="descending"/> > <xsl:analyze-string select="current-grouping-key()" regex="[0-9]+"> > <xsl:non-matching-substring> > <word><xsl:value-of select="current-grouping-key(), ' - ', > count(current-group())"/></word> > </xsl:non-matching-substring> > <xsl:matching-substring/> > </xsl:analyze-string> > </xsl:for-each-group> > </frequencies> > > </xsl:template> > > Seems to work nicely. > Thanks Michael, very useful. > > regards DaveP --- Dr James Cummings, Oxford Text Archive, University of Oxford James.Cummings at ota.ahds.ac.uk http://users.ox.ac.uk/~jamesc/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Re: text() word lists, David Carlisle | Thread | Re: [xsl] Re: text() word lists, David Carlisle |
RE: [xsl] converting flat xml data , Stuart Brown | Date | RE: [xsl] converting flat xml data , Andreas L. Delmelle |
Month |