Re: [xsl] efficient traversal of combined collections in XSLT 3.0

Subject: Re: [xsl] efficient traversal of combined collections in XSLT 3.0
From: Graydon <graydon@xxxxxxxxx>
Date: Tue, 27 Nov 2012 04:58:22 -0500
On Sat, Nov 24, 2012 at 03:27:24PM +0000, Michael Kay scripsit:
> The way we do this in maintaining the XSLT/XQuery specs (admittedly
> much smaller than your 4GB) is to maintain a derived document
> containing a list of valid link targets. This is regenerated when
> the base documents change, which is less frequently than the list is
> used. The list of valid anchors is much smaller than the base
> documents, so it can be loaded more quickly, and uses less memory.

That gets saxon:discard-document() to work.  (well, up until the point
the transform fails with no error message _and_ closing the outer loop;
something, somewhere, is awful in the input.  Which is not a surprise
but is hard to find!)

I _suspect_, but could not take the time to prove, that the use of 

for $x in collection($pathToContent) return
(saxon:discard-document($x)//link,saxon:discard-document($x)//target[not(.//link)])

means that discard-document can't tell it is supposed to let go.
Separating those out into distinct for-each statements made things
behave in a much more useful fashion.

> Also, generating the list of anchors is an operation that can be
> streamed; hopefully the resulting list is small enough that it can
> be held in memory for look-up purposes.

It can; once I've got the list of anchors the compare runs in about
fifteen seconds.

Thank you!

-- Graydon, who keeps getting freaked out by the orders-of-magnitude
run-time differences from apparently small code changes

Current Thread