Subject: Re: [xsl] Grouping elements that have at least one common value From: "Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Mon, 26 Jun 2023 16:16:49 -0000 |
Hi Michael, Thanks for your feedback. Yes true the xsl:break on the iteration over 100 000 000 make it ungreedy, just enough iteration, no more. I add <xsl:assert test=". lt 1000"> it's never launched. I also add an xsl:message to see the value which is no high at all (1, 2, ... less than 10) About the except I use a sequence of generated-ids to excludes already processed nodes : <xsl:template name="els:process"> <xsl:param name="GRCHOIX" as="element()*"/> <xsl:param name="processed-GRCHOIX.ids" select="()" as="xs:string*"/> <xsl:if test="not(empty($GRCHOIX))"> <xsl:variable name="start-node" select="$GRCHOIX[1]" as="element(GRCHOIX)?"/> <xsl:variable name="start-node.transitive-closure" as="element(GRCHOIX)*" select="els:transitive-closure($start-node)"/> <GROUP> <xsl:sequence select="$start-node.transitive-closure"/> </GROUP> <xsl:call-template name="els:process"> <!--<xsl:with-param name="GRCHOIX" select="$GRCHOIX except $start-node.transitive-closure"/>--> <xsl:with-param name="GRCHOIX" select="$GRCHOIX[not(generate-id() = $processed-GRCHOIX.ids)]"/> <xsl:with-param name="processed-GRCHOIX.ids" select="($processed-GRCHOIX.ids, $start-node.transitive-closure/generate-id())"/> </xsl:call-template> </xsl:if> </xsl:template> It looks like it didn't really change the perf. There's another "except" within the "els:transitive-closure" : <xsl:variable name="next-nodes" select="($origin ! els:step(.)) except $result"/> Maybe this one is greedy too ? So I applied the same method, but it didn't go faster neither : <xsl:function name="els:transitive-closure" as="node()*"> <xsl:param name="start-node" as="node()"/> <xsl:iterate select="1 to 100000000"> <xsl:param name="result" as="node()*" select="()"/> <xsl:param name="origin" as="node()*" select="$start-node"/> <xsl:param name="result.ids" as="xs:string*" select="()"/> <xsl:variable name="next-nodes" select="($origin ! els:step(.))[not(generate-id() = $result.ids)]"/> <xsl:assert test=". lt 1000"/> <xsl:choose> <xsl:when test="empty($next-nodes)"> <xsl:sequence select="$result"/> <xsl:break/> </xsl:when> <xsl:otherwise> <xsl:variable name="result.new" select="$result | $next-nodes"/> <xsl:next-iteration> <xsl:with-param name="result" select="$result.new"/> <xsl:with-param name="origin" select="$next-nodes"/> <xsl:with-param name="result.ids" select="$result.new/generate-id()"/> </xsl:next-iteration> </xsl:otherwise> </xsl:choose> </xsl:iterate> </xsl:function> I didn't try the Saxon profiling nor the Oxygen debugger with hotspot (as long as I didn't go to the end of the transformation for now) on the big file. But activating Oxygen debugger on the small sample of this mail give 2 hotspots : - call to els:transitive-closure($start-node) - and key('getGrchoixbyChoixCode', $start/CHOIX/@CODE, $root) I also add xsl:message everywhere and it confirms the call to els:transitive-closure is qui greedy. Maybe this expression : "$origin ! els:step(.)" ? I guess it's ok about memory but I have 37 000 GRCHOIX in my input and after about 5min it looks like 1000 have been processed. It's not linear, it shoud be more and more short. I'll launch the transformation fully to the end tonight to see how long it is. Cheers Matthieu RICAUD-DUSSARGET -- Matthieu Ricaud-Dussarget +33 6.63.25.95.58
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Grouping elements that ha, Michael Kay michaelk | Thread | Re: [xsl] Grouping elements that ha, Matthieu Ricaud-Duss |
Re: [xsl] Grouping elements that ha, Michael Kay michaelk | Date | Re: [xsl] Grouping elements that ha, Matthieu Ricaud-Duss |
Month |