Subject: Re: Fw: [xsl] Question on duplicate node elimination From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx> Date: Tue, 24 Aug 2010 20:52:20 +0200 |
Michael, > I haven't understood your logic in any detail, but I wonder if it > suggests an alternative approach to the problem: namely, avoid creating > RTFs entirely, at least for intermediate results. Instead, whenever you > are evaluating an operation that returns a node-set, represent that > node-set as a string containing the generate-id values of the nodes in > the node-set, space-separated. Elimination of duplicates then reduces to > an operation on strings: not trivial, but not especially difficult either. that is a cool idea. And reading your suggestion of white space separated list of stings I thought on the id() function. This function can do the duplicate elimination "for free" ! Having a document with DOCTYPE/ID and a white space separated string of id's the call of id() with that string does not only return all the nodes with the given id's -- it also does the duplicate node elimination ... I figured out how to create the DOCTYPE definition while creating output by xsl:text. Generating such an output XML file works perfect as can be seen in the demo idc.xsl [1] and below. File idc2.xml is the output generated by calling template idcopy for file simple2.xml. The big question now is, whether exslt:node-set() supports DOCTYPE definitions and how. idc.xsl shows an attempt which does not work. Accessing an element by its id works for document('idc2.xml') but does not work for document(exslt:node-set($rtf)) although both are generated identically by a call to template idcopy. The difference seem to be the parsing from file idc2.xml ... Is DOCTYPE supported by exslt:node-set()? Is the generation of DOCTYPE by <xsl:text> OK for this purpose? Can using id() function be made working for duplicate elimination somehow differently? $ xsltproc idc.xsl simple2.xml ---------- <node id="id2335172" type="text" value="4"/> $ cat simple2.xml <a> <b> <c>1</c> <c>2</c> </b> <b> <c>3</c> <c>4</c> </b> </a> $ cat idc2.xml <!DOCTYPE node [ <!ATTLIST node id ID #REQUIRED> ]> <node id="id2335401" type="element" name="a"><node id="id2335402" type="text" value=" "/><node id="id2335404" type="element" name="b"><node id="id2335405" type="text" value=" "/><node id="id2335406" type="element" name="c"><node id="id2335407" type="text" value="1"/></node><node id="id2335408" type="text" value=" "/><node id="id2335409" type="element" name="c"><node id="id2335162" type="text" value="2"/></node><node id="id2335163" type="text" value=" "/> </node><node id="id2335164" type="text" value=" "/><node id="id2335165" type="element" name="b"><node id="id2335166" type="text" value=" "/><node id="id2335167" type="element" name="c"><node id="id2335168" type="text" value="3"/></node><node id="id2335169" type="text" value=" "/><node id="id2335170" type="element" name="c"><node id="id2335172" type="text" value="4"/></node><node id="id2335173" type="text" value=" "/></node><node id="id2335174" type="text" value=" "/></node> $ $ cat idc.xsl <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" exclude-result-prefixes="exslt" > <xsl:output omit-xml-declaration="yes"/> <xsl:key name="nodes-by-id" match="node()" use="@id"/> <xsl:template match="/"> <xsl:variable name="rtf"> <xsl:call-template name="idcopy"/> </xsl:variable> <xsl:variable name="id1" select= "string(exslt:node-set($rtf)//node[@type='text'][@value='4']/@id)"/> <xsl:for-each select="document(exslt:node-set($rtf))"> <xsl:copy-of select="id($id1)"/> </xsl:for-each> <xsl:text> ---------- </xsl:text> <xsl:variable name="id2" select= "string(document('idc2.xml')//node[@type='text'][@value='4']/@id)"/> <xsl:for-each select="document('idc2.xml')"> <xsl:copy-of select="id($id2)"/> </xsl:for-each> </xsl:template> <xsl:template name="idcopy"> <xsl:text disable-output-escaping="yes"> <![CDATA[<!DOCTYPE node [ <!ATTLIST node id ID #REQUIRED> ]>]]> </xsl:text> <xsl:choose> <xsl:when test="count(. | ../namespace::*) != count(../namespace::*)"> <xsl:apply-templates select="." mode="idcopy"/> </xsl:when> <xsl:otherwise> <node id="{generate-id()}" type="namespace" name="{name()}" value="{.}"/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="@*" mode="idcopy"> <node id="{generate-id()}" type="attribute" name="{name()}" value="{.}"/> </xsl:template> <xsl:template match="node()" mode="idcopy"> <node id="{generate-id()}" type="element" name="{name()}"> <xsl:apply-templates select="@*" mode="idcopy"/> <xsl:for-each select="namespace::*"> <xsl:if test="not(.=../../namespace::*) and name()!='xml'"> <node id="{generate-id()}" type="namespace" name="{name()}" value="{.}"/> </xsl:if> </xsl:for-each> <xsl:apply-templates mode="idcopy" select="*|text()|comment()|processing-instruction()"/> </node> </xsl:template> <xsl:template match="comment()" mode="idcopy"> <node id="{generate-id()}" type="comment" value="{.}"/> </xsl:template> <xsl:template match="processing-instruction()" mode="idcopy"> <node id="{generate-id()}" type="processing-instruction" value="{.}"/> </xsl:template> <xsl:template match="text()" mode="idcopy"> <node id="{generate-id()}" type="text" value="{.}"/> </xsl:template> </xsl:stylesheet> $ [1] http://stamm-wilbrandt.de/en/xsl-list/idc.xsl Mit besten Gruessen / Best wishes, Hermann Stamm-Wilbrandt Developer, XML Compiler, L3 WebSphere DataPower SOA Appliances ---------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Michael Kay <mike@xxxxxxxxxxxx> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Date: 08/24/2010 02:17 PM Subject: Re: Fw: [xsl] Question on duplicate node elimination I haven't understood your logic in any detail, but I wonder if it suggests an alternative approach to the problem: namely, avoid creating RTFs entirely, at least for intermediate results. Instead, whenever you are evaluating an operation that returns a node-set, represent that node-set as a string containing the generate-id values of the nodes in the node-set, space-separated. Elimination of duplicates then reduces to an operation on strings: not trivial, but not especially difficult either. Michael Kay Saxonica
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Fw: [xsl] Question on duplicate, Michael Kay | Thread | Re: Fw: [xsl] Question on duplicate, David Carlisle |
Re: [xsl] Parsing a string as an XM, David Carlisle | Date | Re: Fw: [xsl] Question on duplicate, David Carlisle |
Month |