Re: [xsl] optimization of complex XPath

Subject: Re: [xsl] optimization of complex XPath
From: Graydon <graydon@xxxxxxxxx>
Date: Sun, 21 Nov 2010 08:32:10 -0500
On Fri, Nov 19, 2010 at 02:16:10PM +0100, Wolfgang Laun scripsit:
> On 19 November 2010 13:31, Graydon <graydon@xxxxxxxxx> wrote:
> > On Fri, Nov 19, 2010 at 11:38:40AM +0100, Wolfgang Laun scripsit:
> >> So the initially posted loop will have to be extended to iterate over
> >> 90,000 files...?
> >
> > Yes.
> 
> Then even Michael's O(n.log(n)) might be beyond the tolerance limit.

It could be, but I certainly ought to try it.

> Repeating myself: I'd do a single pass XSLT extraction of links and
> targets, followed by grep, sort -u and comm, and spend the saved time
> surfing ;-)

I'd love to do that, and this time I could almost do that, but
unfortunately it's the _targets_ that keep their area value in an
ancestor.  So it's quite possible to have a case where we have the same
num value but an area that belongs to content we haven't converted yet
so we don't know if we have that link target or not.

Which means that slow as it is, the XPath/XQuery approach is much, much
simpler than building some sort of area-associating parser for the cite
values on the num elements; I can't just use grep.

> Adding another idea: Preocessing single-file results after each
> individual file's processing will leave you with the remainder that's
> either broken or must be matched cross-file-wise, which might help if
> most links are file-local.

It would, yes.  Alas, but they are not.

Thanks!
Graydon

Current Thread