RE: [xsl] how to remove duplicates from more than one file?

Subject: RE: [xsl] how to remove duplicates from more than one file?
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Mon, 9 Dec 2002 09:44:28 -0000
Your code works fine with Saxon. For reference, here is the complete
stylesheet:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output method="xml" version="1.0" encoding="ISO-8859-1"
indent="yes"/>

<xsl:key name="items" match="item" use="@name"/>

<xsl:variable name="source">
     <xsl:copy-of select="document('test1.xml')//item | 
document('test2.xml')//item"/>
</xsl:variable>

<xsl:template match="/">
<xsl:for-each select="$source">
     <xsl:for-each 
select="//item[generate-id(.)=generate-id(key('items', @name)[1])]">
         <xsl:copy-of select="."/>
     </xsl:for-each>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

I can't think of any better way of doing it in XSLT 1.0. Neither the
Muenchian approach nor the preceding-sibling approach to elimination of
duplicates can handle multiple documents directly. You could tune it
slightly by removing the "//". In 2.0, of course, you can use
xsl:for-each-group or the new distinct-values() function.

Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx 

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx 
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Marcin Antczak
> Sent: 07 December 2002 21:04
> To: XSL-List@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] how to remove duplicates from more than one file?
> 
> 
> My input is:
> 
> root.xml
> 
> <root/>
> 
> 
> items_1.xml
> 
> <items>
> 	<item name='one'>value_1</item>
> 	<item name='one'>value_1</item>
> 	<item name='two'>value_2</item>
> 	<item name='three'>value_3</item>
> </items>
> 
> 
> items_2.xml (and more.... items_*.xml)
> 
> <items>
> 	<item name='one'>value_1</item>
> 	<item name='one'>value_1</item>
> 	<item name='two'>value_2</item>
> 	<item name='two'>value_2</item>
> 	<item name='one'>value_1</item>
> 	<item name='seven'>value_7</item>
> </items>
> 
> 
> And I need to generate output with items from all input files without 
> duplicates:
> 
> <itemList>
> 	<item name='one'>value_1</item>
> 	<item name='two'>value_2</item>
> 	<item name='three'>value_3</item>
> 	<item name='seven'>value_7</item>
> </itemList>
> 
> My first idea was to grab extrernal data with document() 
> function into 
> variable and then use Muenchian method on nodeset within this 
> variable.
> 
> In my stylesheet I did something like this:
> 
> <xsl:key name="items" match="item" use="@name"/>
> 
> <xsl:variable name="source">
>      <xsl:copy-of select="document('items_1.xml')//item | 
> document('items_2.xml')//item"/>
> </xsl:variable>
> 
> <xsl:for-each select="$source">
>      <xsl:for-each 
> select="//item[generate-id(.)=generate-id(key('items', @name)[1])]">
>          <test_ok/>
>      </xsl:for-each>
> </xsl:for-each>
> 
> But on my windows machine (win 2000 + IIS 5.0 + PHP 4.2.3 + sablotron 
> 0.96 - server side transformations) i get only segfaults.
> 
> On unix machine (freeBSD) there was no errors but any output 
> at all either.
> 
> Could you give me hint how to resolve this problem?
> 
> 
> Marcin Antczak
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread