Re: [xsl] Grouping elements that have at least one common value

Subject: Re: [xsl] Grouping elements that have at least one common value
From: "Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 26 Jun 2023 16:16:49 -0000
Hi Michael,

Thanks for your feedback.
Yes true the xsl:break on the iteration over 100 000 000 make it ungreedy,
just enough iteration, no more.
I add <xsl:assert test=". lt 1000"> it's never launched. I also add an
xsl:message to see the value which is no high at all (1, 2, ... less than
10)

About the except I use a sequence of generated-ids to excludes
already processed nodes :

<xsl:template name="els:process">
    <xsl:param name="GRCHOIX" as="element()*"/>
    <xsl:param name="processed-GRCHOIX.ids" select="()" as="xs:string*"/>
    <xsl:if test="not(empty($GRCHOIX))">
      <xsl:variable name="start-node" select="$GRCHOIX[1]"
as="element(GRCHOIX)?"/>
      <xsl:variable name="start-node.transitive-closure"
as="element(GRCHOIX)*"
        select="els:transitive-closure($start-node)"/>
      <GROUP>
        <xsl:sequence select="$start-node.transitive-closure"/>
      </GROUP>
      <xsl:call-template name="els:process">
        <!--<xsl:with-param name="GRCHOIX" select="$GRCHOIX except
$start-node.transitive-closure"/>-->
        <xsl:with-param name="GRCHOIX" select="$GRCHOIX[not(generate-id() =
$processed-GRCHOIX.ids)]"/>
        <xsl:with-param name="processed-GRCHOIX.ids"
          select="($processed-GRCHOIX.ids,
$start-node.transitive-closure/generate-id())"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

It looks like it didn't really change the perf.

There's another "except" within the "els:transitive-closure"
: <xsl:variable name="next-nodes" select="($origin ! els:step(.)) except
$result"/>
Maybe this one is greedy too ?

So I applied the same method, but it didn't go faster neither :

<xsl:function name="els:transitive-closure" as="node()*">
    <xsl:param name="start-node" as="node()"/>
    <xsl:iterate select="1 to 100000000">
      <xsl:param name="result" as="node()*" select="()"/>
      <xsl:param name="origin" as="node()*" select="$start-node"/>
      <xsl:param name="result.ids" as="xs:string*" select="()"/>
      <xsl:variable name="next-nodes" select="($origin !
els:step(.))[not(generate-id() = $result.ids)]"/>
      <xsl:assert test=". lt 1000"/>
      <xsl:choose>
        <xsl:when test="empty($next-nodes)">
          <xsl:sequence select="$result"/>
          <xsl:break/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:variable name="result.new" select="$result | $next-nodes"/>
          <xsl:next-iteration>
            <xsl:with-param name="result" select="$result.new"/>
            <xsl:with-param name="origin" select="$next-nodes"/>
            <xsl:with-param name="result.ids"
select="$result.new/generate-id()"/>
          </xsl:next-iteration>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:iterate>
  </xsl:function>

I didn't try the Saxon profiling nor the Oxygen debugger with hotspot (as
long as I didn't go to the end of the transformation for now) on the big
file.
But activating Oxygen debugger on the small sample of this mail give 2
hotspots :
- call to els:transitive-closure($start-node)
- and key('getGrchoixbyChoixCode', $start/CHOIX/@CODE, $root)

I also add xsl:message everywhere and it confirms the call to
els:transitive-closure is qui greedy.
Maybe this expression : "$origin ! els:step(.)" ?
I guess it's ok about memory but I have 37 000 GRCHOIX in my input and
after about 5min it looks like 1000 have been processed.
It's not linear, it shoud be more and more short.
I'll launch the transformation fully to the end tonight to see how long it
is.

Cheers
Matthieu RICAUD-DUSSARGET



-- 
Matthieu Ricaud-Dussarget
+33 6.63.25.95.58

Current Thread