Subject: Re: [xsl] Preferred declarative approach for outputting tallies based on complex triggers|
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 10 Apr 2014 14:09:51 +0100
On Thu, Apr 10, 2014 at 2:00 PM, David Carlisle <davidc@xxxxxxxxx> wrote:
Well quite I was going to ask what you mean by "declarative/non-declarative and "updating state variables" in an XSLT system.
Didn't you hear? Michael Kay showed us how to update variables in XSLT :J (reference to another thread...)
Seriously, I was just referring to using a variable of the same name in a new scope, as in: <xsl:next-iteration> <xsl:param name="var1" select="$new.var1"/> </xsl:next-iteration>
yes well (I know:-) but since xslt has "variables" and "parameters" and that's a parameter it's confusing in an XSLT context to call them variables.
I'd do something like something (untested)
<xsl:variable name="sids" select="31,35"/> <xsl:variable name="a" select="(item,item[@id=$sids])[last()]"/> <xsl:variable name="b" select="(item[last()],item[@id=$sids][last()])[last()]"/> <xsl:variable name="s" select="$a|item[$a<<.][.<<$b]|$b"/>
no of items <xsl:value-of select="count($s)"/> no of specials <xsl:value-of select="count($s[@id=$sids])/> avg <xsl:value of select="sum($s/@value) div count($s)"/>
Thanks, David. I was trying to avoid this approach for performance considerations. I wanted to do this type of analysis in a single pass.
Shrug. I just work on the assumption of absolute faith in Michael's ability to optimise whatever code I use to do the right thing at run time.
(Also, I don't think this specific implementation addresses the possibility that an item with the same @id might show up multiple times, and only the first one is "special." So I was expecting this approach to need something like "for $s in $sids return index-of($s, $seq/@id)" to retrieve the positions of the special items... this seemed to me to be quite expensive if there were several such items and the number of $items is large.
I didn't follow all the details but usually use of keys as in muenchian grouping and/or use of for-each-group can avoid the quadratic behaviour of a double loop (but might still be O(n log n) rather than O(n) as you'd get from a single pass).
So I was planning on using <xsl:iterate> and keeping track of information such as: Have I seen a special item yet? (this is what I mean as a "trigger" as it signals when to start keeping track of data.)
What $sids have I not seen yet? (This lets me judge whether a given item is "special" since it is only special the first time.)
Yes as Michael and I both said in our first replies xsl:iterate is the same as a recursive function except set up for streaming use.
What is the sum of all values for @value among items I _know_ to be in the subsequence I care about.
What is the sum of all values for @value that I have seen since the last time I saw a special item? (This sum will be added to the above sum next time I see a special id for the first time.)
Getting two values out of a sequence without iterating over a sequence twice is course a perennial problem in functional programming generally and the answers are the same whatever language, either don't worry about it (as in my code sketch) or use a recursive function (or application of fold) with a function that returns some kind of structure that holds both results.
And similar information like the above that can then be brought together at the end to give the results I want in a single pass.
If your sequences are long enough that you need to worry about any of this, especially if they are long enough that you don't want to hold them all in memory, then xsl:iterate is likely to be the only approach that doesn't run out of memory so "preferred" may not be the right description.