Re: [xsl] How to do this tricky elimination on XML using XSLT 2.0?

Subject: Re: [xsl] How to do this tricky elimination on XML using XSLT 2.0?
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Tue, 19 Jun 2012 15:20:46 +0100
I think I would tackle this in two passes. First use xsl:for-each-group to identify the nodes to be removed; then do a modified identity transform that retains only the nodes not in this list.

The first pass is something like this:

<!-- **Two node that have the same `name` and `id` will be considered *repetitive* if it appears one after another and it has the same `method` and `children`.** -->
<xsl:variable name="removed-nodes" as="element(*)*">
<xsl:for-each-group select="//blockA/*" group-by="concat(@id, '~', @method, '~', otherchild)">
<xsl:sequence select="subsequence(current-group(), 2)"/>
</xsl:for-each-group>
</xsl:variable>


The second pass is:

<xsl:template match="*">
<xsl:if test="empty(. intersect $removed-nodes)">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:if>
</xsl:template>

Michael Kay
Saxonica

On 19/06/2012 10:14, Jo Na wrote:
Hi,
I have this input xml:
     <map>
         <region>
             <gridA id="1">
                 <blockA id="01" method="build">
                     <building1 id="x" method="build">
                         <otherchild>a</otherchild>
                     </building1>
                     <building1 id="x" method="build">  <!-- this one
will be removed -->
                         <otherchild>a</otherchild>
                     </building1>
                 </blockA>

                 <blockA id="01">
                     <building1 id="x" method="modify">
                         <otherchild>a</otherchild>
                     </building1>
                     <building1 id="x" method="build">  <!-- this one
will be kept (prev node have same id but diff method so it's not
considered as successive -->
                         <otherchild>a</otherchild>
                     </building1>
                 </blockA>

                 <blockA id="02">
                     <building3 id="y" method="modify">
                         <otherchild>b</otherchild>
                     </building3>
                     <building2 id="x" method="demolish"/>
                 </blockA>

                 <blockA id="01">
                     <building1 id="y" method="build">  <!-- this one
will be kept (diff id) -->
                         <otherchild>a</otherchild>
                     </building1>
                     <building1 id="x" method="build">  <!-- this one
will be removed -->
                         <otherchild>a</otherchild>
                     </building1>
                 </blockA>

                 <blockA id="02">
                     <building3 id="y" method="modify">  <!-- this one
will be removed -->
                         <otherchild>b</otherchild>
                     </building3>
                     <building2 id="x" method="demolish"/>  <!-- this
one will be removed -->
                 </blockA>
             </gridA>

             <gridA id="2">
                 <blockA id="01" method="build">
                     <building1 id="x" method="build">
                         <otherchild>a</otherchild>
                     </building1>
                     <building1 id="x" method="build">  <!-- this one
will be removed -->
                         <otherchild>a</otherchild>
                     </building1>
                     <building1 id="x" method="build">  <!-- this one
will be kept (diff children) -->
                         <otherchild>b</otherchild>
                     </building1>
                 </blockA>
                 <blockA id="01">
                     <building1 id="x" method="build">  <!-- this one
will be removed -->
                         <otherchild>b</otherchild>
                     </building1>
                 </blockA>
             </gridA>
             <gridB id="1">
                 ...and so on..
             </gridB>
         </region>
     </map>

Expected Output:

     <map>
         <region>
             <gridA id="1">
                 <blockA id="01" method="build">
                     <building1 id="x" method="build">
                         <otherchild>a</otherchild>
                     </building1>
                 </blockA>

                 <blockA id="01">
                     <building1 id="x" method="modify">
                         <otherchild>a</otherchild>
                     </building1>
                     <building1 id="x" method="build">  <!-- this one
will be kept (prev node have same id but diff method so it's not
considered as successive -->
                         <otherchild>a</otherchild>
                     </building1>
                 </blockA>

                 <blockA id="02">
                     <building3 id="y" method="modify">
                         <otherchild>b</otherchild>
                     </building3>
                     <building2 id="x" method="demolish"/>
                 </blockA>

                 <blockA id="01">
                     <building1 id="y" method="build">  <!-- this one
will be kept (diff id) -->
                         <otherchild>a</otherchild>
                     </building1>
                 </blockA>

                 <blockA id="02"/>
             </gridA>

             <gridA id="2">
                 <blockA id="01" method="build">
                     <building1 id="x" method="build">
                         <otherchild>a</otherchild>
                     </building1>

                     <building1 id="x" method="build">  <!-- this one
will be kept (diff children) -->
                         <otherchild>b</otherchild>
                     </building1>
                 </blockA>
                 <blockA id="01"/>
             </gridA>
             <gridB id="1">
                 ...and so on..
             </gridB>
         </region>
     </map>
The XSLT so far:

     <xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
         <xsl:output indent="yes"/>  <xsl:strip-space elements="*"/>

         <xsl:template match="node()|@*">
             <xsl:copy>
                 <xsl:apply-templates select="node()|@*"/>
             </xsl:copy>
         </xsl:template>

         <xsl:template match="region/*/*/*
              [deep-equal(.,preceding::*[name()=current()/name()]
                            [@id = current()/@id]
                            [../../@id = current()/../../@id][1])]" />
     </xsl:stylesheet>

the problem with the XSLT right now is that it cannot differentiate
duplicates that happens in siblings (i.e blockA with the same id).

I need to remove a node that are considered as *repetitive*.

**Two node that have the same `name` and `id` will be considered
*repetitive* if it appears one after another and it has the same
`method` and `children`.**

**for example:**

     <elem id="1" method="a" />
     <elem id="1" method="a" />  <!-- this is repetitive for id=1-->
     <elem id="1" method="b" />
     <elem id="1" method="a" />  <!-- this is the new boundary for removal id=1-->
     <elem id="2" method="a" />
     <elem id="1" method="a" />  <!-- this is repetitive for id=1 -->
     <elem id="2" method="a" />  <!-- this is repetitive for id=2 -->

**will be simplified into:**

     <elem id="1" method="a" />
     <elem id="1" method="b" />
     <elem id="1" method="a" />  <!-- this is the new boundary for removal id=1-->
     <elem id="2" method="a" />

  **- Everytime a successive node with the `same id` has `different method`,
    the `boundary` for the next removal for that `id` is reset.**

  - we need to take into account duplicates that are under one parent
or siblings (two or more parents nodes that has the same element name
and id) i.e (in example: `blockX`)
  - if the two nodes being compared did not share the same `gridX`
level, then they should not be considered as duplicates to be removed

Please let me know how to achieve such transformation using XSLT 2.0.
Thanks very much for the help.

Current Thread