Subject: Re: [xsl] Collect word count with xslt2.0 on saxon 8 From: George Cristian Bina <george@xxxxxxxxxxxxx> Date: Tue, 16 May 2006 10:04:22 +0300 |
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output indent="yes"/> <xsl:template match="/"> <counts> <xsl:apply-templates/> </counts> </xsl:template> <xsl:template match="text()"/> <xsl:template match="*[contains(@class, 'topic/topic')]"> <xsl:variable name="text"> <xsl:apply-templates mode="getText" select="node()"/> </xsl:variable> <record> <text> <xsl:value-of select="$text"/> </text> <count> <xsl:value-of
select="count(tokenize(lower-case($text),'(\s|[,.!:;]|[n][b][s][p][;])+')[string(.)])" /> </count> </record> <xsl:apply-templates/> </xsl:template>
<xsl:template match="*[contains(@class, 'topic/topic')]" mode="getText"/> </xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?> <counts> <record> <text> communications and information theory top element elements can be nested Generalized Markup Language defined by ISO 8879. </text> <count>17</count> </record> <record> <text> communications and information theory top element elements can be nested (for a number of technical reasons beyond the scope of this article). </text> <count>22</count> </record> <record> <text> communications and information theory top element elements can be nested maintain repositories of structured documentation for more than a decade, but it is not well </text> <count>25</count> </record> <record> <text> But the metrics for XML on the Web communications and information theory top element elements can be nested measures, or are a little polluted by voodoo ideology about good </text> <count>29</count> </record> </counts>
Best Regards, George --------------------------------------------------------------------- George Cristian Bina <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com
I have the following structure that i need to collect
word counts for from each element that has a class
attribute that contains " topic/topic " without counting its child elements that also contain
the the class attribute " topic/topic "
root>
<topic class=" topic/topic foo/bar ">
<p> communications and information theory</p>
<title> top element</title>
<relinfo> elements can be nested</relinfo> Generalized Markup Language defined by ISO
8879.
<concept class=" topic/topic foo/bar ">
<p> communications and information
theory</p>
<title> top element</title>
<relinfo> elements can be nested</relinfo>
(for a number of technical reasons beyond
the scope of this article).
<topic class=" topic/topic foo/bar ">
<p> communications and information
theory</p>
<title> top element</title>
<relinfo> elements can be
nested</relinfo> maintain repositories of structured documentation for more than a decade, but it is not
well <concept class=" topic/topic foo/bar
">
But the metrics for XML on the Web
<p> communications and
information theory</p>
<title> top element</title>
<relinfo> elements can be
nested</relinfo> measures, or are a little polluted
by voodoo ideology about good </concept>
</topic>
</concept>
</topic>
</root>
I have this template that gets the word count for each element and its child elements including the elements that have class attributes that contains " topic/topic ".
<xsl:template match="*[contains(@class, 'topic/topic ')]"> <xsl:variable name="level" select="count(ancestor::*[contains(@class, 'topic/topic ')]) + 1"/> <xsl:variable name="ct" select="if ($level = 1) then concat(title,' ') else ' '"/> <xsl:variable name="h1" select="if ($level = 2) then concat(title,' ') else ' '"/> <xsl:variable name="h2" select="if ($level = 3) then concat(title,' ') else ' '"/> <xsl:variable name="h3" select="if ($level = 4) then concat(title,' ') else ' '"/>
<xsl:variable name="wc"
select="count(tokenize(lower-case(.),'(\s|[,.!:;]|[n][b][s][p][;])+')[string(.)])"
/>
<xsl:apply-templates/> </xsl:template>
I added another template that contains the count of its child elements b
<xsl:template match="*[contains(@class,
'topic/topic ')]" mode="filterCount">
<sum>
<xsl:value-of
select="count(tokenize(lower-case(.),'(\s|[,.!:;]|[n][b][s][p][;])+')[string(.)])"/>
</sum>
</xsl:template>
That I store in a variable and then subtract from the total within in the first template above
<xsl:variable name="childcounts">
<sums>
<xsl:apply-templates
mode="filterCount"/> </sums>
</xsl:variable>
<xsl:variable name="total-child" select="sum($childcounts/sums/sum)"/> <xsl:variable name="total-roman" select="sum($wc - $total-child)"/>
I would like to find a more elegant approach to this because there are also other attributes in this content that need to have the same technique applied to b
Would it be a better approach to copy the elements to another document node and then perform the word count which would be applied recursively to all child elements to arrive at the count and what would this template match look like?
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Collect word count with xslt2, Karen McAdams | Thread | RE: [xsl] Collect word count with x, Karen McAdams |
[xsl] Transforming Tables - repost, Jeff Sese | Date | RE: [xsl] Transforming Tables - rep, Michael Kay |
Month |