Re: Clean data using XSLT

Subject: Re: Clean data using XSLT
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Mon, 07 Aug 2000 23:28:35 +0100
Michal,

>I use this XSLT on it:
><xsl:for-each select='//Property[not(.=preceding::Property)]'>
>     <xsl:sort select="@Id" />
>
>     <xsl:variable name="PropertyAddress" select="text()" />
>
>     <BR />
>     <xsl:value-of select="@Id" />) <xsl:value-of select="." />
>     (appears <xsl:value-of select="count(//Property[.=$PropertyAddress])" />
>times)
>
></xsl:for-each>
[snip]
>I would like to have the data also sorted by the number of times this
version of
> the address appears in the data, so my output would be:

You've managed to describe what you want to do, which is always the
majority of the work :)  Let me show you my rudimentary translation:

"sorted"                      -> xsl:sort
"the number of times"         -> count()
"this version of the address" -> current()/text() [or just current()]
"the data"                    -> //Property

The data are the Property elements: //Property

You are only interested in the ones that have the same content as the
content of the item you're currently sorted.  When you're only interested
in a subset of nodes, this means a predicate.  The predicate needs to test
the content of the particular Property element against the content of the
node you're sorting:

  //Property[. = current()]

Then you want to count the number of nodes in this set:

  count(//Property[. = current()])

So, if you add another sort using this as the 'select' expression and
making sure to sort in descending order so that the ones that appear most
frequently appear first in the list, you can do it:

<xsl:for-each select='//Property[not(.=preceding::Property)]'>
     <xsl:sort select="@Id" />
     <xsl:sort select="count(//Property[. = current()])" order="descending" />

     <xsl:variable name="PropertyAddress" select="text()" />

     <BR />
     <xsl:value-of select="@Id" />) <xsl:value-of select="." />
     (appears <xsl:value-of select="count(//Property[.=$PropertyAddress])" />
times)

</xsl:for-each>

A slight change that I'd make for performance is to give the direct path to
the Property elements rather than using '//'.  It can be quite laborious
for processors to search the entire tree, and as you don't have nested
Property elements all over the place, there's no need to do so:

<xsl:for-each select='/RootNode/Property[not(.=preceding::Property)]'>
     <xsl:sort select="@Id" />
     <xsl:sort select="count(/RootNode/Property[. = current()])" 
               order="descending" />

     <xsl:variable name="PropertyAddress" select="text()" />

     <BR />
     <xsl:value-of select="@Id" />) <xsl:value-of select="." />
     (appears <xsl:value-of
select="count(/RootNode/Property[.=$PropertyAddress])" />
times)

</xsl:for-each>

The other thing is that I'd think about using keys to index the Property
elements both by @Id and by their content, so that it's easy to (a)
identify the first Property with a particular @Id (b) identify all the
other Properties with that @Id and (c) count how many Properties there are
with that same address:

<xsl:key name="property-ids" match="Property" select="@Id" />
<xsl:key name="property-address" match="Property" select="." />

...
<xsl:for-each select='/RootNode/Property[generate-id() =
generate-id(key('property-ids', @Id)[1])]'>
     <xsl:sort select="@Id" />
     <xsl:sort select="count(key('property-address', .))"
order="descending" />

     <xsl:variable name="PropertyAddress" select="text()" />

     <BR />
     <xsl:value-of select="@Id" />) <xsl:value-of select="." />
     (appears <xsl:value-of select="count(/key('property-address', .))" />
times)

</xsl:for-each>
...

I hope this helps,

Jeni

Dr Jeni Tennison
Epistemics Ltd * Strelley Hall * Nottingham * NG8 6PE
tel: 0115 906 1301 * fax: 0115 906 1304 * email: jeni.tennison@xxxxxxxxxxxxxxxx


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread