Re: [xsl] Grouping elements that have at least one common value

Subject: Re: [xsl] Grouping elements that have at least one common value
From: "John Lumley john@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 16 Jun 2023 13:29:24 -0000
Doesnbt the xsl:iterate solution need an xsl:break under the xsl:when
test=bempty($next-nodes)b, Mike?

John Lumley

Sent from my iPad

> On 16 Jun 2023, at 13:00, Michael Kay michaelkay90@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> o;?This feels to me as being not so much a grouping problem, as a transitive
closure problem. There's a relation between two elements ("have at least one
value in common") and you're looking for the transitive closure of that
relation to form one group.
>
> For efficiency you first need to find a way of evaluating the basic relation
efficiently, and then you need an efficient way of finding its transitive
closure.
>
> For the basic relation, start by defining a key
>
> <xsl:key name="k" match="GRCHOIX" use="CHOIX/@CODE"/>
>
> Then key('k', 'choix-1') will find all GRCHOIX elements that have 'choix-1'
as one of their values.
>
> And key('k', //GRCHOIX/@CODE[.="grchoix-1"]/CHOIX/@CODE) will find all
GRCHOIX elements that have either 'choix-1' or 'choix-2' as one of their
values.
>
> So the base relation is defined by a function:
>
> <xsl:function name="f:step"> as="element(GRCHOIX)*">
>   <xsl:param name="start" as="element(GRCHOIX)"/>
>   <xsl:sequence select="key('k', $start/CHOIX/@CODE)"/>
> </xsl:function>
>
> Then you need to find the transitive closure of this step.
>
> As it happens we've just three days ago defined a transitive closure
function for inclusion in XPath 4.0, which you can find here:
>
>
https://qt4cg.org/pr/533/xpath-functions-40/Overview.html#func-transitive-clo
sure
>
> Unfortunately, you need it now! You can implement it yourself, but it's not
particularly easy. One way might be like this:
>
> <xsl:iterate select="1 to 100000000">
>    <xsl:param name="result" as="node()*" select="()"/>
>    <xsl:param name="origin" as="node()*" select="$start-node"/>
>    <xsl:variable name="next-nodes" select="($origin ! f:step(.)) except
$result"/>
>    <xsl:choose>
>       <xsl:when test="empty($next-nodes)">
>            <xsl:sequence select="$result"/>
>       </xsl:when>
>       <xsl:otherwise>
>             <xsl:next-iteration>
>                  <xsl:with-param name="result" select="$result |
$next-ndoes"/>
>                  <xsl:with-param name="origin" select="$next-nodes"/>
>            </
>       </
>  </
>
> which basically keeps looking for more nodes satisfying the relation until
no more are found.
>
> This gives you one group starting from one specific $start-node, which you
presumably select at random.
>
> Then repeat, starting from a different $start-node, one that has not already
been allocated to a group; until no more nodes are left.
>
> NOT TESTED!
>
> Michael Kay
> Saxonica
>
>
>
>> grchoix-1
>
>
>
>> On 16 Jun 2023, at 12:09, Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> Hi all,
>>
>> I need to group elements that have at least one common value :
>>
>> <FORMS>
>>   <!--CASE 1-->
>>   <GRCHOIX CODE="grchoix-1">
>>     <CHOIX CODE="choix-1"/>
>>     <CHOIX CODE="choix-2"/>
>>   </GRCHOIX>
>>   <GRCHOIX CODE="grchoix-2">
>>     <CHOIX CODE="choix-1"/>
>>     <CHOIX CODE="choix-2"/>
>>     <CHOIX CODE="choix-3"/>
>>   </GRCHOIX>
>>   <GRCHOIX CODE="grchoix-3">
>>     <CHOIX CODE="choix-2"/>
>>     <CHOIX CODE="choix-3"/>
>>     <CHOIX CODE="choix-4"/>
>>   </GRCHOIX>
>>   <!--CASE 2-->
>>   <GRCHOIX CODE="grchoix-A">
>>     <CHOIX CODE="choix-a"/>
>>     <CHOIX CODE="choix-b"/>
>>   </GRCHOIX>
>>   <GRCHOIX CODE="grchoix-B">
>>     <CHOIX CODE="choix-b"/>
>>     <CHOIX CODE="choix-c"/>
>>   </GRCHOIX>
>>   <GRCHOIX CODE="grchoix-C">
>>     <CHOIX CODE="choix-d"/>
>>     <CHOIX CODE="choix-e"/>
>>   </GRCHOIX>
>>   <GRCHOIX CODE="grchoix-D">
>>     <CHOIX CODE="choix-c"/>
>>     <CHOIX CODE="choix-d"/>
>>   </GRCHOIX>
>> </FORMS>
>>
>> * grchoix-1 and grchoix-2 have 2 common values : choix-1, choix-2 b they
must belong the same group
>> * grchoix-2 and grchoix-3 have 2 common values : choix-2, choix-3 b they
must belong the same group
>> b At the end : grchoix-1, grchoix-2 and grchoix-3 must belong the same
group
>>
>> * grchoix-A and grchoix-B have 1 common value choix-b b they must belong
the same group
>> * grchoix-B and grchoix-C have 0 common values
>> * but as grchoix-D have
>>    - 1 common value (choix-c) with grchoix-B
>>    - 1 one common value (choix-d) with grchoix-C
>>   Then grchoix-B and grchoix-C must also belong the same GROUP
>> b At the end : grchoix-A, grchoix-B, grchoix-C and grchoix-D must belong
the same group
>>
>> The expected XML output is 2 Groups :
>> <FORMS>
>>   <!--CASE 1-->
>>   <GROUP>
>>     <GRCHOIX CODE="grchoix-1">
>>       <CHOIX CODE="choix-1"/>
>>       <CHOIX CODE="choix-2"/>
>>     </GRCHOIX>
>>     <GRCHOIX CODE="grchoix-2">
>>       <CHOIX CODE="choix-1"/>
>>       <CHOIX CODE="choix-2"/>
>>       <CHOIX CODE="choix-3"/>
>>     </GRCHOIX>
>>     <GRCHOIX CODE="grchoix-3">
>>       <CHOIX CODE="choix-2"/>
>>       <CHOIX CODE="choix-3"/>
>>       <CHOIX CODE="choix-4"/>
>>     </GRCHOIX>
>>   </GROUP>
>>   <!--CASE 2-->
>>   <GROUP>
>>     <GRCHOIX CODE="grchoix-A">
>>       <CHOIX CODE="choix-a"/>
>>       <CHOIX CODE="choix-b"/>
>>     </GRCHOIX>
>>     <GRCHOIX CODE="grchoix-B">
>>       <CHOIX CODE="choix-b"/>
>>       <CHOIX CODE="choix-c"/>
>>     </GRCHOIX>
>>     <GRCHOIX CODE="grchoix-C">
>>       <CHOIX CODE="choix-d"/>
>>       <CHOIX CODE="choix-e"/>
>>     </GRCHOIX>
>>     <GRCHOIX CODE="grchoix-D">
>>       <CHOIX CODE="choix-c"/>
>>       <CHOIX CODE="choix-d"/>
>>     </GRCHOIX>
>>   </GROUP>
>> </FORMS>
>>
>> I'm really not sure how to go ahead with such a grouping problem.
>> I've tried something that work on my sample but it's probably too
complicated (maybe it doesnt' work all cases) and most of all it has drastic
bad performances !
>> I'm using fro-each-group on multiple values, wich causes all combination to
be executed and make the internal variables huge (from A 8Mo input I get a 500
Mo size of intermediate step)
>> At this point I get without surprise a saxon internal error on my real
(rather big) XML input.
>>
>> See my (so) bad solution in a next message (I get a message size to long
when I copy it here)
>>
>> I'm pretty sure there's something quite more elegant to do, did you ever
had to do such a grouping ?
>> BTW I'm using XSLT 3.
>>
>> Any suggestion/help appreciated
>>
>>  Cheers,
>>
>> Matthieu Ricaud-Dussarget
>>
>>
>>
>> XSL-List info and archive
>> EasyUnsubscribe (by email)
>
> XSL-List info and archive
> EasyUnsubscribe (by email)

Current Thread