Re: [xsl] Grouping elements that have at least one common value

Subject: Re: [xsl] Grouping elements that have at least one common value
From: "Michael Kay michaelkay90@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 16 Jun 2023 12:00:00 -0000
This feels to me as being not so much a grouping problem, as a transitive
closure problem. There's a relation between two elements ("have at least one
value in common") and you're looking for the transitive closure of that
relation to form one group.

For efficiency you first need to find a way of evaluating the basic relation
efficiently, and then you need an efficient way of finding its transitive
closure.

For the basic relation, start by defining a key

<xsl:key name="k" match="GRCHOIX" use="CHOIX/@CODE"/>

Then key('k', 'choix-1') will find all GRCHOIX elements that have 'choix-1' as
one of their values.

And key('k', //GRCHOIX/@CODE[.="grchoix-1"]/CHOIX/@CODE) will find all GRCHOIX
elements that have either 'choix-1' or 'choix-2' as one of their values.

So the base relation is defined by a function:

<xsl:function name="f:step"> as="element(GRCHOIX)*">
  <xsl:param name="start" as="element(GRCHOIX)"/>
  <xsl:sequence select="key('k', $start/CHOIX/@CODE)"/>
</xsl:function>

Then you need to find the transitive closure of this step.

As it happens we've just three days ago defined a transitive closure function
for inclusion in XPath 4.0, which you can find here:

https://qt4cg.org/pr/533/xpath-functions-40/Overview.html#func-transitive-clo
sure

Unfortunately, you need it now! You can implement it yourself, but it's not
particularly easy. One way might be like this:

<xsl:iterate select="1 to 100000000">
   <xsl:param name="result" as="node()*" select="()"/>
   <xsl:param name="origin" as="node()*" select="$start-node"/>
   <xsl:variable name="next-nodes" select="($origin ! f:step(.)) except
$result"/>
   <xsl:choose>
      <xsl:when test="empty($next-nodes)">
           <xsl:sequence select="$result"/>
      </xsl:when>
      <xsl:otherwise>
            <xsl:next-iteration>
                 <xsl:with-param name="result" select="$result |
$next-ndoes"/>
                 <xsl:with-param name="origin" select="$next-nodes"/>
           </
      </
 </

which basically keeps looking for more nodes satisfying the relation until no
more are found.

This gives you one group starting from one specific $start-node, which you
presumably select at random.

Then repeat, starting from a different $start-node, one that has not already
been allocated to a group; until no more nodes are left.

NOT TESTED!

Michael Kay
Saxonica



> grchoix-1



> On 16 Jun 2023, at 12:09, Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi all,
>
> I need to group elements that have at least one common value :
>
> <FORMS>
>   <!--CASE 1-->
>   <GRCHOIX CODE="grchoix-1">
>     <CHOIX CODE="choix-1"/>
>     <CHOIX CODE="choix-2"/>
>   </GRCHOIX>
>   <GRCHOIX CODE="grchoix-2">
>     <CHOIX CODE="choix-1"/>
>     <CHOIX CODE="choix-2"/>
>     <CHOIX CODE="choix-3"/>
>   </GRCHOIX>
>   <GRCHOIX CODE="grchoix-3">
>     <CHOIX CODE="choix-2"/>
>     <CHOIX CODE="choix-3"/>
>     <CHOIX CODE="choix-4"/>
>   </GRCHOIX>
>   <!--CASE 2-->
>   <GRCHOIX CODE="grchoix-A">
>     <CHOIX CODE="choix-a"/>
>     <CHOIX CODE="choix-b"/>
>   </GRCHOIX>
>   <GRCHOIX CODE="grchoix-B">
>     <CHOIX CODE="choix-b"/>
>     <CHOIX CODE="choix-c"/>
>   </GRCHOIX>
>   <GRCHOIX CODE="grchoix-C">
>     <CHOIX CODE="choix-d"/>
>     <CHOIX CODE="choix-e"/>
>   </GRCHOIX>
>   <GRCHOIX CODE="grchoix-D">
>     <CHOIX CODE="choix-c"/>
>     <CHOIX CODE="choix-d"/>
>   </GRCHOIX>
> </FORMS>
>
> * grchoix-1 and grchoix-2 have 2 common values : choix-1, choix-2 b they
must belong the same group
> * grchoix-2 and grchoix-3 have 2 common values : choix-2, choix-3 b they
must belong the same group
> b At the end : grchoix-1, grchoix-2 and grchoix-3 must belong the same
group
>
> * grchoix-A and grchoix-B have 1 common value choix-b b they must belong
the same group
> * grchoix-B and grchoix-C have 0 common values
> * but as grchoix-D have
>    - 1 common value (choix-c) with grchoix-B
>    - 1 one common value (choix-d) with grchoix-C
>   Then grchoix-B and grchoix-C must also belong the same GROUP
> b At the end : grchoix-A, grchoix-B, grchoix-C and grchoix-D must belong
the same group
>
> The expected XML output is 2 Groups :
> <FORMS>
>   <!--CASE 1-->
>   <GROUP>
>     <GRCHOIX CODE="grchoix-1">
>       <CHOIX CODE="choix-1"/>
>       <CHOIX CODE="choix-2"/>
>     </GRCHOIX>
>     <GRCHOIX CODE="grchoix-2">
>       <CHOIX CODE="choix-1"/>
>       <CHOIX CODE="choix-2"/>
>       <CHOIX CODE="choix-3"/>
>     </GRCHOIX>
>     <GRCHOIX CODE="grchoix-3">
>       <CHOIX CODE="choix-2"/>
>       <CHOIX CODE="choix-3"/>
>       <CHOIX CODE="choix-4"/>
>     </GRCHOIX>
>   </GROUP>
>   <!--CASE 2-->
>   <GROUP>
>     <GRCHOIX CODE="grchoix-A">
>       <CHOIX CODE="choix-a"/>
>       <CHOIX CODE="choix-b"/>
>     </GRCHOIX>
>     <GRCHOIX CODE="grchoix-B">
>       <CHOIX CODE="choix-b"/>
>       <CHOIX CODE="choix-c"/>
>     </GRCHOIX>
>     <GRCHOIX CODE="grchoix-C">
>       <CHOIX CODE="choix-d"/>
>       <CHOIX CODE="choix-e"/>
>     </GRCHOIX>
>     <GRCHOIX CODE="grchoix-D">
>       <CHOIX CODE="choix-c"/>
>       <CHOIX CODE="choix-d"/>
>     </GRCHOIX>
>   </GROUP>
> </FORMS>
>
> I'm really not sure how to go ahead with such a grouping problem.
> I've tried something that work on my sample but it's probably too
complicated (maybe it doesnt' work all cases) and most of all it has drastic
bad performances !
> I'm using fro-each-group on multiple values, wich causes all combination to
be executed and make the internal variables huge (from A 8Mo input I get a 500
Mo size of intermediate step)
> At this point I get without surprise a saxon internal error on my real
(rather big) XML input.
>
> See my (so) bad solution in a next message (I get a message size to long
when I copy it here)
>
> I'm pretty sure there's something quite more elegant to do, did you ever had
to do such a grouping ?
> BTW I'm using XSLT 3.
>
> Any suggestion/help appreciated
>
>  Cheers,
>
> Matthieu Ricaud-Dussarget
>
>
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/3500899> (by
email <>)

Current Thread