|
Subject: [xsl] distinct-values() optimization, sorting by frequency From: "James Cummings" <cummings.james@xxxxxxxxx> Date: Fri, 8 Feb 2008 14:27:56 +0000 |
Hiya,
I'm wondering the best way to optimize a distinct-values() based
transformation. What I'm basically doing is:
======
<xsl:variable name="docs" select="collection('../../working/xml/files.xml')"/>
<xsl:template name="main" >
<xsl:variable name="persNames" select="$docs//tei:text//tei:persName"/>
<xsl:variable name="norm-persNames"
select="$persNames/normalize-space(lower-case(.))"/>
<xsl:variable name="distinct-persNames"
select="distinct-values($norm-persNames)"/>
<!-- I realize that I could be more specific on the $persNames
variable, but doing so doesn't seem to affect speed much at all. -->
<div type="main">
<!-- Some overall counts -->
<div><head>Overall Counts</head>
<list type="unordered">
<item>Number of <gi>persName</gi> elements total:
<xsl:value-of select="count($persNames)"/></item>
<item>Number of <gi>persName</gi> elements which have a @key
attribute total: <xsl:value-of
select="count($persNames[@key])"/></item>
<item>Number of distinct-value <gi>persName</gi> elements total:
<xsl:value-of select="count($distinct-persNames)"/></item>
</list></div>
<!-- An Alphabetical List -->
<div><head>Alphabetical List</head>
<list type="unordered">
<xsl:for-each select="$distinct-persNames">
<xsl:sort select="."/>
<xsl:variable name="current-name" select="."/>
<xsl:variable name="count-distinct-current-name"
select="count($persNames[normalize-space(lower-case(.)) =$current-name])"/>
<item><xsl:value-of select="concat($current-name,
' -- ', $count-distinct-current-name)"/></item>
</xsl:for-each>
</list>
</div>
<!-- A Frequency Sorted List -->
<div>
<head>Frequency List</head>
<list type="unordered">
<xsl:for-each select="$distinct-persNames">
<xsl:sort select="count($persNames[normalize-space(lower-case(.))
= .])"/>
<!-- I think it is this sort statement which slows things down, since
I have to repeat it twice. -->
<xsl:variable name="current-name" select="."/>
<xsl:variable name="count-distinct-current-name"
select="count($persNames[normalize-space(lower-case(.))
= $current-name])"/>
<item><xsl:value-of select="concat($count-distinct-current-name,
' -- ', $current-name)"/> </item>
</xsl:for-each>
</list>
</div>
</div>
======
I think the real slow-down comes in the second xsl:for-each where I
want to sort by frequency of distinct-value by doing:
<xsl:sort select="count($persNames[normalize-space(lower-case(.)) = .])"/>
I have to have it for the sort, and then I have to re-do it for the
output inside the <item> element. I'm obviously not allowed a
variable between the for-each and the sort... but I have a feeling I'm
missing some clever optimization here.
Although this is for a pre-generated transformation, it currently
takes a *hugely* long time, and I'm thinking I must be able to
optimize it somehow.
Any suggestions appreciated,
-James
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| RE: [xsl] RE: Ignore case while gro, Michael Kay | Thread | [xsl] Re: distinct-values() optimiz, James Cummings |
| RE: [xsl] RE: Ignore case while gro, Michael Kay | Date | [xsl] Re: distinct-values() optimiz, James Cummings |
| Month |