Subject: RE: [xsl] distinct-values() optimization, sorting by frequency From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Fri, 8 Feb 2008 14:48:28 -0000 |
In the alphabetical list, count($persNames[normalize-space(lower-case(.)) =$current-name])"/ could be optimized by: (a) using keys (b) using Saxon-SA which will optimize it to use a key automatically (c) using xsl:for-each-group rather than distinct-values(), though that will require some restructuring of your code. In the frequency-sorted list, I think for-each-group would definitely be better: <xsl:for-each-group select="$persNames" group-by="lower-case(.)"> <xsl:sort select="count(current-group())"/> ... (Note also the use of a case-blind collation rather than lower-case(), discussed in another thread today) Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: James Cummings [mailto:cummings.james@xxxxxxxxx] > Sent: 08 February 2008 14:28 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] distinct-values() optimization, sorting by frequency > > Hiya, > > I'm wondering the best way to optimize a distinct-values() > based transformation. What I'm basically doing is: > ====== > <xsl:variable name="docs" > select="collection('../../working/xml/files.xml')"/> > > <xsl:template name="main" > > <xsl:variable name="persNames" > select="$docs//tei:text//tei:persName"/> > <xsl:variable name="norm-persNames" > select="$persNames/normalize-space(lower-case(.))"/> > <xsl:variable name="distinct-persNames" > select="distinct-values($norm-persNames)"/> > <!-- I realize that I could be more specific on the > $persNames variable, but doing so doesn't seem to affect > speed much at all. --> <div type="main"> > > <!-- Some overall counts --> > <div><head>Overall Counts</head> > <list type="unordered"> > <item>Number of <gi>persName</gi> elements total: > <xsl:value-of select="count($persNames)"/></item> > <item>Number of <gi>persName</gi> elements which have a > @key attribute total: <xsl:value-of > select="count($persNames[@key])"/></item> > <item>Number of distinct-value <gi>persName</gi> elements total: > <xsl:value-of select="count($distinct-persNames)"/></item> > </list></div> > > <!-- An Alphabetical List --> > <div><head>Alphabetical List</head> > <list type="unordered"> > <xsl:for-each select="$distinct-persNames"> > <xsl:sort select="."/> > <xsl:variable name="current-name" select="."/> > <xsl:variable name="count-distinct-current-name" > select="count($persNames[normalize-space(lower-case(.)) > =$current-name])"/> > <item><xsl:value-of select="concat($current-name, > ' -- ', $count-distinct-current-name)"/></item> > </xsl:for-each> > </list> > </div> > > <!-- A Frequency Sorted List --> > <div> > <head>Frequency List</head> > <list type="unordered"> > <xsl:for-each select="$distinct-persNames"> > <xsl:sort > select="count($persNames[normalize-space(lower-case(.)) > = .])"/> > <!-- I think it is this sort statement which slows things > down, since I have to repeat it twice. --> > <xsl:variable name="current-name" select="."/> > <xsl:variable name="count-distinct-current-name" > select="count($persNames[normalize-space(lower-case(.)) > = $current-name])"/> > <item><xsl:value-of select="concat($count-distinct-current-name, > ' -- ', $current-name)"/> </item> > </xsl:for-each> > </list> > </div> > </div> > ====== > > I think the real slow-down comes in the second xsl:for-each > where I want to sort by frequency of distinct-value by doing: > <xsl:sort > select="count($persNames[normalize-space(lower-case(.)) = > .])"/> I have to have it for the sort, and then I have to > re-do it for the output inside the <item> element. I'm > obviously not allowed a variable between the for-each and the > sort... but I have a feeling I'm missing some clever > optimization here. > > Although this is for a pre-generated transformation, it > currently takes a *hugely* long time, and I'm thinking I must > be able to optimize it somehow. > > Any suggestions appreciated, > > -James
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] distinct-values() optimiz, Andrew Welch | Thread | Re: [xsl] distinct-values() optimiz, David Carlisle |
Re: [xsl] distinct-values() optimiz, Andrew Welch | Date | Re: [xsl] distinct-values() optimiz, David Carlisle |
Month |