Re: [xsl] Grouping elements that have at least one common value

Subject: Re: [xsl] Grouping elements that have at least one common value
From: "Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 26 Jun 2023 06:05:56 -0000
Hi all,

I go ahead with Martin's solution and have implemented all business rules
around that "grouping".
It's fast but I realized that it can generate duplicated groups on my big
file, which is quite a problem (some people in my company will have to
spend avec 60 days working on that output as an Excel file)

It's not that easy to reproduce but for example when I have this input :
<FORMS>
    <GRCHOIX>
        <CHOIX CODE="choix-10"/>
        <CHOIX CODE="choix-11"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-12"/>
        <CHOIX CODE="choix-14"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-12"/>
        <CHOIX CODE="choix-15"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-2"/>
        <CHOIX CODE="choix-8"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-3"/>
        <CHOIX CODE="choix-5"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-22"/>
        <CHOIX CODE="choix-3"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-10"/>
        <CHOIX CODE="choix-13"/>
        <CHOIX CODE="choix-18"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-11"/>
        <CHOIX CODE="choix-16"/>
    </GRCHOIX>
    <GRCHOIX>
        <CHOIX CODE="choix-12"/>
        <CHOIX CODE="choix-16"/>
    </GRCHOIX>
</FORMS>

The output had duplicated GROUP "choix-10/choix-13/choix-18".

<FORMS>
   <GROUP>
      <GRCHOIX>
         <CHOIX CODE="choix-10"/>
         <CHOIX CODE="choix-11"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-12"/>
         <CHOIX CODE="choix-14"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-12"/>
         <CHOIX CODE="choix-15"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-10"/>
         <CHOIX CODE="choix-13"/>
         <CHOIX CODE="choix-18"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-11"/>
         <CHOIX CODE="choix-16"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-12"/>
         <CHOIX CODE="choix-16"/>
      </GRCHOIX>
   </GROUP>
   <GROUP>
      <GRCHOIX>
         <CHOIX CODE="choix-2"/>
         <CHOIX CODE="choix-8"/>
      </GRCHOIX>
   </GROUP>
   <GROUP>
      <GRCHOIX>
         <CHOIX CODE="choix-3"/>
         <CHOIX CODE="choix-5"/>
      </GRCHOIX>
      <GRCHOIX>
         <CHOIX CODE="choix-22"/>
         <CHOIX CODE="choix-3"/>
      </GRCHOIX>
   </GROUP>
   <GROUP>
      <GRCHOIX>
         <CHOIX CODE="choix-10"/>
         <CHOIX CODE="choix-13"/>
         <CHOIX CODE="choix-18"/>
      </GRCHOIX>
   </GROUP>
   <GROUP/>
</FORMS>

I tried to figure out why, but there is something I don't understand in the
algorithm :
- xsl:iterate make an iteration on $groups elements.
- when going into the xsl:otherwise it creates a <GROUP> output.
Does the "grouping" strategy depends on element order ? Or the same GROUP
element might still be fed at the next iteration (unlike xsl:for-each ?)

I also give a try to Michael transitive closure algorithm (see next mail)

Cheers
Matthieu

Le lun. 19 juin 2023 C  22:44, Joel Kalvesmaki director@xxxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> a C)crit :

> Hi Matthieu,
>
> Currently TAN is a static download, either through github or the
> website. Making it available through package repos is a future to-do
> item, as well as better organization into subpackages and breaking out
> dependencies. The license was designed to encourage other developers to
> develop their own variations on the code, as needed.
>
> A new function proposed for XPath 4.0, currently transitive-closure()
> (name under discussion, https://github.com/qt4cg/qtspecs/issues/554), is
> likely to make this task more tractable, and concisely expressed.
>
> Best wishes,
>
> jk
>
>
> On 2023-06-19 01:38, Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx wrote:
> > Hi Joel,
> >
> > Thanks for the link to Tan library. I'm not sure I can use it for my
> > purpose, because it groups the text content of child nodes. But I
> > guess I could adapt my input or the function code.
> >
> > BTW it looks like TAN functions use a lot of other TAN functions,
> > which means I should get the whole TAN lib to make it work on my
> > project.
> >
> > How is it distributed ? Using http might probably work but it's not
> > that safe when running on a server of my company that might not be
> > connected to the internet (or with proxy restrictions for example). Is
> > TAN library published as a Maven artifact of something like that ?
> >
> > Anyway Martin's solution works really fine and performances are really
> > good so I guess I will stay on this solution for my project.
> >
> > Thanks again Martin !
> >
> > Now I have to deal with business rules around this "grouping" :)
> >
> > Thank you all for your time,
> >
> > Cheers
> >
> > Matthieu
> >
> > Le ven. 16 juin 2023 C  16:54, Joel Kalvesmaki director@xxxxxxxxxxxxx
> > <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> a C)crit :
> >
> >> Hi Matthieu,
> >>
> >> You may want to look at tan:group-elements-by-shared-node-values().
> >>
> >> Overview:
> >>
> >
>
https://textalign.net/release/TAN-2021/guidelines/xhtml/ch13s02.xhtml#functio
n-group-elements-by-shared-node-values
> >>
> >> Code (starting line 272):
> >>
> >
>
https://github.com/textalign/TAN-2021/blob/master/functions/nodes/TAN-fn-node
s-standard.xsl
> >>
> >> Joel
> >>
> >> On 2023-06-16 05:09, Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx
> >> wrote:
> >>> Hi all,
> >>>
> >>> I need to group elements that have at least one common value :
> >>>
> >>> <FORMS>
> >>>
>
> --
> Joel Kalvesmaki
> Director, Text Alignment Network
> http://textalign.net
>
>
>

--
Matthieu Ricaud-Dussarget
+33 6.63.25.95.58

Current Thread