Re: [xsl] text() word lists

Subject: Re: [xsl] text() word lists
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Fri, 6 Feb 2004 08:38:18 -0800 (PST)
> Hi there,
> 
> I'm sure this is a faq, and I've checked the faq and archive.
> I swear I remember someone asking about it, but I couldn't
> find it, so here goes.
> 
> I want to take an XML file of unknown elements and create
> a word frequency list / word list.  Now, an entry on sorting
> in the xslt faq says this is just what xslt is bad at.  (And
> I'm sure there are some that would say 'just go use perl',
> but let's say I want to do it in xslt(1 or 2).
> 
> XSLT2 makes the tokenization of strings much easier, so
> assuming I'm using that, if I have:
> 
> <foo> 
> <blort>  This is a <wibble>Test</wibble>, only a test!</blort>
> <blort>  This really is a <wibble>great big test</wibble>, only a test!
> </blort>
> </foo> 
> 
> I don't know that foo|wibble|blort  will be the element names.
> 
> But I want to produce both:
> 
> a  -- 4
> test  -- 4
> only -- 2
> is  -- 2
> this  -- 2
> big -- 1
> great -- 1
> really -- 1
> 
> Which (unless I've missed something) should be
> a case-insensitive list grouped by frequency
> sorted alphabetically within this, and ignoring
> punctuation.
> 
> But also:
> 
> a  -- 4
> big -- 1
> great -- 1
> is  -- 2
> only -- 2
> test  -- 4
> this  -- 2
> really -- 1
> 
> Which is the same list by not grouped
> by frequency.
> 
> Suggestions? Solutions?
> 
> Many thanks for any help,
> -James
> ---
> Dr James Cummings, Oxford Text Archive, University of Oxford
> James.Cummings at ota.ahds.ac.uk http://users.ox.ac.uk/~jamesc/

Using FXSL and Saxon 7 (This was intended to be essentially an XSLT 1.0
solution, until I realized that there cannot be references to variables in
xsl:key -- I need to change this a little bit to work in XSLT 1.0) one
would write:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:ext="http://exslt.org/common";
>

   <xsl:import href="strSplit-to-Words.xsl"/>
   
   <xsl:key name="kWordByVal" match="word" 
   use="translate(., $vUpper, $vLower)"/>
   
   <xsl:variable name="vUpper" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
   <xsl:variable name="vLower" select="'abcdefghijklmnopqrstuvwxyz'"/>

   <xsl:output indent="yes" omit-xml-declaration="yes"/>
   
    <xsl:template match="/">
      <xsl:variable name="vwordNodes">
        <xsl:call-template name="str-split-to-words">
          <xsl:with-param name="pStr" select="/"/>
          <xsl:with-param name="pDelimiters" 
                          select="', &#9;&#10;&#13;!'"/>
        </xsl:call-template>
      </xsl:variable>  
      
      <xsl:for-each 
       select="ext:node-set($vwordNodes)/*[normalize-space()]
                    [generate-id()
                    =
                     generate-id(key('kWordByVal',
                                     translate(., $vUpper, $vLower)
                                    )[1])
                    ]">
          <xsl:sort select="count(key('kWordByVal',
                                     translate(., $vUpper, $vLower)
                                   )
                               )"
                            data-type="number"
                            order="descending" />
          
          <xsl:value-of 
          select="concat('&#xA;', 
                         translate(., $vUpper, $vLower),
                         ' - ',
                         count(key('kWordByVal',
                                     translate(., $vUpper, $vLower)
                                   )
                               )
                         )"/>
        
        </xsl:for-each>
      
    </xsl:template>
</xsl:stylesheet>

When this transformation is applied on your source.xml:

<foo> 
 <blort>  This is a <wibble>Test</wibble>, only a test!</blort>
 <blort>  This really is a <wibble>great big test</wibble>, only a test! 
 </blort>
</foo> 

The wanted result is produced:

a - 4
test - 4
this - 2
is - 2
only - 2
really - 1
great - 1
big - 1


For the other output you just have to change the "select" attribute of
xsl:sort.

Solving this kind of tasks is almost trivial using FXSL.


Cheers,

Dimitre Novatchev
FXSL developer, 

http://fxsl.sourceforge.net/ -- the home of FXSL
Resume: http://fxsl.sf.net/DNovatchev/Resume/Res.html


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread