Subject: [xsl] Finding unique nodes in a non-sibling nodeset From: Mike Berrow <mberrow@xxxxxxxxxxx> Date: Sat, 29 Jun 2002 10:04:38 -0700 |
In a code generation transform that I am working on, I frequently encounter situations where I need to eliminate duplicate expressions or event calls. The nodes with the commonality to be detected are often scattered around different parts of a large (preprocessed) reference document that is loaded with a document call. Previously, I had eliminated duplicates with something of the form $list[not(@key1=preceding-sibling::*/@key1)] or $list[not(@key1=preceding::*/@key1)] ... If I wanted to look back through the whole document. In this situation however, the nodes to be duplicate-trimmed are [A] Selected out of the reference document in very specific contextual ways (e.g. deep inside xsl:template / xsl:for-each usages) [B] Not all sibling nodes [C] The preceding axis can't be used since it looks at the whole preceding area of the document, not just my carefully selected nodes. [D] The definition of duplication requires use of multiple node attributes. i.e. needs a composite key. Even if [D] were not true, the "preceding-sibling" axis approach would not work because of [B] and the "preceding" axis approach would not work because of [C]. I eventually hit on a way to solve this (since I use Saxon) using saxon:tokenize. But I always wondered if there was a non-extension way to do it. What I did was build an aggregate string with delimiters from the nodes in the set in question (in a variable called "$list"), like so ... <xsl:variable name="aggregate"> <xsl:for-each select="$list"> <xsl:value-of select="concat(@key1,'/',@key2)" /> <xsl:if test="not(position()=last())"><xsl:text>#</xsl:text></xsl:if> </xsl:for-each> </xsl:variable> Then use tokenize to get a node set ... <xsl:variable name="list4" select="saxon:tokenize($aggregate,'#')"/> And eliminate the duplicates the standard (?) way with <xsl:variable name="list4NoDups" select="$list4[not(.=preceding-sibling::*)]"/> I'm then able to process the node subset I was trying to get since I have the keys embedded in the strings in the resultant node-set. All was well, until my colleague decided to try out Saxon 7.1 which (it turns out) changes the behavior of tokenize(). In that version, the nodeset comes back in such a way that you can't use the "preceding" axis on it. There are features in Saxon 7.1 that we are very interested in, so I needed to try to find a different technique. It turns out that the following has exactly the desired effect (in one line!!) <xsl:variable name="listNoDups" select="saxon:distinct($list, saxon:expression('concat(@key1,@key2)'))"/> and I could have done that all along. However, I still wondered if there was a way of doing this without extensions. So I put the problem to my good friend Chris Maden (yes, *the* Chris Maden) ... but not in as much detail as I have given here. Chris said "Muenchian Keys!!" I hadn't yet used that technique anywhere (but heard it mentioned a lot) so decided to give it a whirl. Well, it does solve the problem, but with a restriction that makes it unusable for me. I set up my key like so: <xsl:key name="Key1Key2" match="item[@flavour='sour']/fact" use="concat(@key1,@key2)"/> Then used: <xsl:variable name="uniqueKey1Key2forFlavour" select="$list[generate-id()=generate-id(key('Key1Key2',concat(@key1,@key2)))]"/> Which does the trick, but I can't use it since xsl:key is a top-level element and I have situation [A] to deal with. So, my questions are ... [1] Is there a non-extension, non-xsl:key way of doing this? [2] If not, is there a better way than saxon:distinct approach? Thanks for bearing with me :-) I have attached my current test data, test transform and output since it may help to clarify what I'm trying to do. -- Mike Berrow ========== input.xml ============== <document> <item flavour="sweet" > <fact key1="AA" key2="BB" val="11"/> <fact key1="XX" key2="CC" val="22"/> <fact key1="AA" key2="BB" val="33"/> </item> <item flavour="sour" > <fact key1="XX" key2="CC" val="11"/> <fact key1="XX" key2="BB" val="33"/> <fact key1="YY" key2="BB" val="22"/> </item> <item flavour="sweet" > <fact key1="XX" key2="CC" val="33"/> <fact key1="XX" key2="BB" val="22"/> <fact key1="AA" key2="BB" val="11"/> </item> <item flavour="sour" > <fact key1="YY" key2="BB" val="33"/> <fact key1="XX" key2="CC" val="11"/> <fact key1="YY" key2="BB" val="22"/> </item> </document> ========== dupElim.xsl ============== <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:saxon="http://icl.com/saxon" version="1.0"> <!-- Finding unique nodes in a non-sibling nodeset... by Mike Berrow --> <xsl:output method="xml"/> <xsl:key name="Key1Key2" match="item[@flavour='sour']/fact" use="concat(@key1,@key2)"/> <xsl:template match="document"> <!-- Select nodes of interest --> <xsl:variable name="list" select="item[@flavour='sour']/fact"/> <!-- Single value, attempt 1 --> <xsl:comment>For $list[not(@key1=preceding-sibling::*/@key1)]</xsl:comment> <xsl:text>
	</xsl:text><xsl:comment>We get ...</xsl:comment> <xsl:variable name="list1NoDups" select="$list[not(@key1=preceding-sibling::*/@key1)]"/> <xsl:for-each select="$list1NoDups"> <xsl:text>
	</xsl:text> <xsl:value-of select="concat(@key1,'/',@key2)" /> </xsl:for-each> <xsl:text>
	</xsl:text> <xsl:comment>Not desired: 'preceding-sibling' can't see 'preceding cousin'</xsl:comment><xsl:text>

</xsl:text> <!-- Single value, attempt 2 --> <xsl:comment>For $list[not(@key1=preceding::*/@key1)]</xsl:comment> <xsl:text>
	</xsl:text><xsl:comment>We get ...</xsl:comment> <xsl:variable name="list2NoDups" select="$list[not(@key1=preceding::*/@key1)]"/> <xsl:for-each select="$list2NoDups"> <xsl:text>
	</xsl:text> <xsl:value-of select="concat(@key1,'/',@key2)" /> </xsl:for-each> <xsl:text>
	</xsl:text> <xsl:comment>Not desired: 'preceding' looks at the whole doc</xsl:comment><xsl:text>

</xsl:text> <!-- Try Multi-value --> <xsl:comment>For $list[not(concat(@key1,@key2)=concat(preceding::*/@key1,preceding::*/@key2))]</xsl:comment> <xsl:text>
	</xsl:text><xsl:comment>We get ...</xsl:comment> <xsl:variable name="list3NoDups" select="$list[not(concat(@key1,@key2)=concat(preceding::*/@key1,preceding::*/@key2))]"/> <xsl:for-each select="$list3NoDups"> <xsl:text>
	</xsl:text> <xsl:value-of select="concat(@key1,'/',@key2)" /> </xsl:for-each> <xsl:text>
	</xsl:text> <xsl:comment>Not desired: result of a naive composite key attempt</xsl:comment><xsl:text>

</xsl:text> <!-- Multi-value using saxon::tokenize --> <xsl:comment>Using aggregation, saxon:tokenize then 'not(.=preceding-sibling::*)'</xsl:comment> <xsl:variable name="aggregate"> <xsl:for-each select="$list"> <xsl:value-of select="concat(@key1,'/',@key2)" /> <xsl:if test="not(position()=last())"><xsl:text>#</xsl:text></xsl:if> </xsl:for-each> </xsl:variable> <xsl:variable name="list4" select="saxon:tokenize($aggregate,'#')"/> <xsl:variable name="list4NoDups" select="$list4[not(.=preceding-sibling::*)]"/> <xsl:for-each select="$list4NoDups"> <xsl:text>
	</xsl:text> <xsl:value-of select="." /> </xsl:for-each> <xsl:text>
	</xsl:text> <xsl:comment>Which is the desired result</xsl:comment><xsl:text>

</xsl:text> <!-- Multi-value using saxon::distinct --> <xsl:comment>saxon:distinct($list, saxon:expression('concat(@key1,@key2)')</xsl:comment> <xsl:for-each select="saxon:distinct($list, saxon:expression('concat(@key1,@key2)'))"> <xsl:text>
	</xsl:text> <xsl:value-of select="concat(@key1,'/',@key2)" /> </xsl:for-each> <xsl:text>
	</xsl:text> <xsl:comment>Which is tighter code than using tokenize</xsl:comment><xsl:text>

</xsl:text> <!-- Multi-value using Muenchian --> <xsl:comment>Using <xsl:text><xsl:key name="Key1Key2" match="item[@flavour='sour']/fact" use="concat(@key1,@key2)"/></xsl:text> and select="$list[generate-id(.)=generate-id(key('Key1Key2',concat(@key1,@key2)))]"</xsl:comment> <xsl:variable name="uniqueKey1Key2forFlavour" select="$list[generate-id()=generate-id(key('Key1Key2',concat(@key1,@key2)))]"/> <xsl:for-each select="$uniqueKey1Key2forFlavour"> <xsl:text>
	</xsl:text> <xsl:value-of select="concat(@key1,'/',@key2)" /> </xsl:for-each> <xsl:text>
	</xsl:text> <xsl:comment>Which is the Muenchian approach, but since xsl:key is a top level element, this will not help when nodesets need to be calculated in specific, non-whole-document contexts</xsl:comment><xsl:text>

</xsl:text> </xsl:template> </xsl:stylesheet> ========== minSet.xml ============== <?xml version="1.0" encoding="utf-8"?> <!--For $list[not(@key1=preceding-sibling::*/@key1)]--> <!--We get ...--> XX/CC YY/BB YY/BB XX/CC <!--Not desired: 'preceding-sibling' can't see 'preceding cousin'--> <!--For $list[not(@key1=preceding::*/@key1)]--> <!--We get ...--> YY/BB <!--Not desired: 'preceding' looks at the whole doc--> <!--For $list[not(concat(@key1,@key2)=concat(preceding::*/@key1,preceding::*/@key2))]--> <!--We get ...--> XX/CC XX/BB YY/BB YY/BB XX/CC YY/BB <!--Not desired: result of a naive composite key attempt--> <!--Using aggregation, saxon:tokenize then 'not(.=preceding-sibling::*)'--> XX/CC XX/BB YY/BB <!--Which is the desired result--> <!--saxon:distinct($list, saxon:expression('concat(@key1,@key2)')--> XX/CC XX/BB YY/BB <!--Which is tighter code than using tokenize--> <!--Using <xsl:key name="Key1Key2" match="item[@flavour='sour']/fact" use="concat(@key1,@key2)"/> and select="$list[generate-id(.)=generate-id(key('Key1Key2',concat(@key1,@key2)))]"--> XX/CC XX/BB YY/BB <!--Which is the Muenchian approach, but since xsl:key is a top level element, this will not help when nodesets need to be calculated in specific, non-whole-document contexts--> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] XSL Formatter X2.2 Release In, Tokushige Kobayashi | Thread | RE: [xsl] Finding unique nodes in a, Michael Kay |
Re: [xsl] How parser maintains and , Mike Brown | Date | Re: [xsl] remove extra chars from t, Stan Scott |
Month |