RE: [xsl] Grouping elements in xslt 1.0

Subject: RE: [xsl] Grouping elements in xslt 1.0
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 1 Dec 2009 10:26:11 -0000
This is a pretty tough problem even with XSLT 2.0, because unlike the usual
grouping problems, you seem to be applying a more complex test to determine
whether two elements are duplicates. It's not actually 100% clear from your
post what test you are applying (you say in the prose that the elements must
be "similar", which is pretty vague), but it looks from your example as if
they must be deep-equal in the XSLT 2.0 sense, perhaps after stripping
whitespace.

Also, you say your solution "does not generate the expected output" -
presumably you mean it doesn't generate the desired output, because there is
no reason to expect this code to remove duplicates.

The solution depends partly on your data volumes. Can you afford the cost of
comparing every element with every other (n^2 comparisons)? It also depends
on whether you really are forced to use XSLT 1.0 (which is rather like
putting shelves up without being allowed to use a power drill.)

The normal 1.0 approach is Muenchian grouping (if you don't know what that
is, look it up), but the difficulty in this case is computing a suitable
grouping key. Before trying to tackle that, I think it would be best to have
a clearer statement of requirements.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay 

 

> -----Original Message-----
> From: lee qm [mailto:akimilee@xxxxxxxxx] 
> Sent: 01 December 2009 07:50
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Grouping elements in xslt 1.0
> 
> I am using xslt 1.0.
> 
> Input xml (simplified version):
> 
> <?xml version="1.0" encoding="utf-8"?>
> <ASBMessage id="id" version="version" timestamp="timestamp">
>   <Body items="items">
>      <AccountData operation="operation">
>         <Account>
>            <ID value="value" namespace="namespace"/>
>            <ID value="value" namespace="namespace"/>
>            <Type id="id"/>
>         </Account>
>      </AccountData>
>   </Body>
> </ASBMessage>
> 
> 
> Expected output:
> 
> <?xml version="1.0" encoding="utf-8"?>
> <data>
>   <o t="TVEQE_ASBMessage">
>      <a n="id">
>         <v s="id"/>
>      </a>
>      <a n="timestamp">
>         <v s="timestamp"/>
>      </a>
>      <a n="version">
>         <v s="version"/>
>      </a>
>      <a n="TVEQE_Body">
>         <o t="TVEQE_Body">
>            <a n="items">
>               <v s="items"/>
>            </a>
>            <a n="TVEQE_AccountData">
>               <o t="TVEQE_AccountData">
>                  <a n="operation">
>                     <v s="operation"/>
>                  </a>
>                  <a n="TVEQE_Account">
>                     <o t="TVEQE_Account">
>                        <a n="TVEQE_ID">
>                           <o t="TVEQE_ID">
>                              <a n="namespace">
>                                 <v s="namespace"/>
>                              </a>
>                              <a n="value">
>                                 <v s="value"/>
>                              </a>
>                           </o>
>                           <o t="TVEQE_ID">
>                              <a n="namespace">
>                                 <v s="namespace"/>
>                              </a>
>                              <a n="value">
>                                 <v s="value"/>
>                              </a>
>                           </o>
>                        </a>
>                        <a n="TVEQE_Type">
>                           <o t="TVEQE_Type">
>                              <a n="id">
>                                 <v s="id"/>
>                              </a>
>                           </o>
>                        </a>
>                     </o>
>                  </a>
>               </o>
>            </a>
>         </o>
>      </a>
>   </o>
> </data>
> 
> 
> My xslt below does not generate the expected output for 
> similar element which appeared more than once.
> 
> 
> <xsl:template match="/ASBMessage">
>  <data>
>  <o t="TVEQE_{name(.)}">
>  <xsl:for-each select="@*">
>  <a n="{name(.)}"><v s="{.}"></v></a>
>  </xsl:for-each>
>  <xsl:apply-templates/>
>   </o>
>  </data>
> </xsl:template>
> 
> <xsl:template match="*">
>  <a n="TVEQE_{name(.)}">
>  <o t="TVEQE_{name(.)}">
>  <xsl:for-each select="@*">
>  <a n="{name(.)}"><v s="{.}"></v></a>
>  </xsl:for-each>
>  <xsl:apply-templates/>
>  </o>
>  </a>
>  </xsl:template>
> 
> 
> For example, for <ID> element which appeared twice, it generated,
> 
>                        <a n="TVEQE_ID">
>                           <o t="TVEQE_ID">
>                              <a n="namespace">
>                                 <v s="namespace"/>
>                              </a>
>                              <a n="value">
>                                 <v s="value"/>
>                              </a>
>                           </o>
>                        </a>
> 
>                        <a n="TVEQE_ID">
>                           <o t="TVEQE_ID">
>                              <a n="namespace">
>                                 <v s="namespace"/>
>                              </a>
>                              <a n="value">
>                                 <v s="value"/>
>                              </a>
>                           </o>
>                        </a>
> 
> 
> The expected output is to have both ID element encapsulated 
> in single <a></a> element.
> Any help is appreciated. Thanks.

Current Thread