Subject: [xsl] Re: grouping and word counting From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx> Date: Sat, 19 Jul 2003 18:56:04 +0200 |
Hi Marina, One can use the string tokeniser from FXSL (the "str-split-to-words" template) in order to obtain a list of words from a string and then count them. This, combined with the Muenchian method for grouping gives us the following solution. This transformation: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ext="http://exslt.org/common" exclude-result-prefixes="ext"> <xsl:import href="strSplit-to-Words.xsl"/> <xsl:output method="text"/> <xsl:key name="kMsg" match="MESSAGE" use="."/> <xsl:key name="kByCount" match="m" use="@count"/> <xsl:template match="/"> <xsl:variable name="vPass1"> <xsl:for-each select="/*/*/MESSAGE[generate-id() = generate-id(key('kMsg', . )[1] ) ]"> <xsl:sort select="count(key('kMsg',.))" data-type="number"/> <m count="{count(key('kMsg',.))}" text="{.}"/> </xsl:for-each> </xsl:variable> <xsl:for-each select="ext:node-set($vPass1)/m [generate-id() = generate-id(key('kByCount', @count )[1] ) ]"> <xsl:sort select="count(key('kByCount', @count))" data-type="number"/> <xsl:variable name="vAllText"> <xsl:for-each select="key('kByCount', @count)"> <xsl:value-of select="concat(' ', @text, ' ')"/> </xsl:for-each> </xsl:variable> <xsl:variable name="vrtfWords"> <xsl:call-template name="str-split-to-words"> <xsl:with-param name="pStr" select="$vAllText"/> <xsl:with-param name="pDelimiters" select="' '"/> </xsl:call-template> </xsl:variable> <xsl:variable name="vAvWords" select="(count(ext:node-set($vrtfWords)/word) - 1) div count(key('kByCount', @count))"/> <xsl:value-of select="concat(count(key('kByCount', @count ) ), ' ', @count, ' ', $vAvWords, '
' )"/> </xsl:for-each> </xsl:template> </xsl:stylesheet> when applied on your source.xml: <LOG> <SENT> <USER> 12345 </USER> <LOCATION> 55555 </LOCATION> <TARGET> 1 </TARGET> <TARGET_LOCATION> 23222 </TARGET_LOCATION> <MESSAGE> hello Fred </MESSAGE> </SENT> <SENT> <USER> 77777 </USER> <LOCATION> 76666 </LOCATION> <TARGET> 3 </TARGET> <TARGET_LOCATION> 34444 </TARGET_LOCATION> <MESSAGE> nice weather </MESSAGE> </SENT> <SENT> <USER> 77777 </USER> <LOCATION> 76666 </LOCATION> <TARGET> 4 </TARGET> <TARGET_LOCATION> 67777 </TARGET_LOCATION> <MESSAGE> nice weather </MESSAGE> </SENT> <SENT> <USER> 33333 </USER> <LOCATION> 12666 </LOCATION> <TARGET> 8 </TARGET> <TARGET_LOCATION> 98765 </TARGET_LOCATION> <MESSAGE> whats the latest news? </MESSAGE> </SENT> <SENT> <USER> 33333 </USER> <LOCATION> 12666 </LOCATION> <TARGET> 9 </TARGET> <TARGET_LOCATION> 46578 </TARGET_LOCATION> <MESSAGE> whats the latest news? </MESSAGE> </SENT> </LOG> produces the wanted result: 1 1 2 2 2 3 Hope this helped. ===== Cheers, Dimitre Novatchev. http://fxsl.sourceforge.net/ -- the home of FXSL "marina" <marina777uk@xxxxxxxxx> wrote in message news:20030719075801.60127.qmail@xxxxxxxxxxxxxxxxxxxxxxxxxx > Hi, > > I have an XML document that contains messages sent by > people to one another. Many of these messages in the > <MESSAGE> tags are repeated as they are sent by one > person to many others. > > XML Snippet: > -------------------------------------------------- > <LOG> > <SENT> > <USER> 12345 </USER> > <LOCATION> 55555 </LOCATION> > <TARGET> 1 </TARGET> > <TARGET_LOCATION> 23222 </TARGET_LOCATION> > <MESSAGE> hello Fred </MESSAGE> > </SENT> > <SENT> > <USER> 77777 </USER> > <LOCATION> 76666 </LOCATION> > <TARGET> 3 </TARGET> > <TARGET_LOCATION> 34444 </TARGET_LOCATION> > <MESSAGE> nice weather </MESSAGE> > </SENT> > <SENT> > <USER> 77777 </USER> > <LOCATION> 76666 </LOCATION> > <TARGET> 4 </TARGET> > <TARGET_LOCATION> 67777 </TARGET_LOCATION> > <MESSAGE> nice weather </MESSAGE> > </SENT> > <SENT> > <USER> 33333 </USER> > <LOCATION> 12666 </LOCATION> > <TARGET> 8 </TARGET> > <TARGET_LOCATION> 98765 </TARGET_LOCATION> > <MESSAGE> whats the latest news? </MESSAGE> > </SENT> > <SENT> > <USER> 33333 </USER> > <LOCATION> 12666 </LOCATION> > <TARGET> 9 </TARGET> > <TARGET_LOCATION> 46578 </TARGET_LOCATION> > <MESSAGE> whats the latest news? </MESSAGE> > </SENT> > </LOG> > -------------------------------------------------- > What I need to do is:- > > 1) Find out how many messages over all were sent to 1, > 2, 3 etc people. > > As a duplicated message will always follow the > original, i.e. be the next <MESSAGE> tag of the > following sibling node, I'm thinking that the > stylesheet would start with the first message and keep > comparing siblings until it found one that was > different. Then it would just add the previous number > of sibling nodes? ( I probably need to use keys?) > > 2) For each of the total messages per group size, > calculate the average number of words. No idea on this > one I'm afraid! > > So the desired output from the snippet above would be: > - > > Group Size Number of Messages Av Number Words > 1 1 2 > 2 2 3 > (up to say 20) > > Many thanks in advance for any help, > > Marina > > > > > __________________________________ > Do you Yahoo!? > SBC Yahoo! DSL - Now only $29.95 per month! > http://sbc.yahoo.com > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] grouping and word counting, marina | Thread | Re: [xsl] grouping and word counti, Martin Rowlinson |
Re: [xsl] Re: (Probably trivial) gr, Stefan Tilkov | Date | Re: [xsl] grouping and word counti, Martin Rowlinson |
Month |