Subject: RE: [xsl] Benefits of using xsl:key From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Tue, 3 Nov 2009 22:27:06 -0000 |
> By modifying the simple example by Michael M|ller-Hillebrand: > > Input.xml > > <Doc> > <value oid="f37">some text</value> > <value oid="f61">some text</value> > <value oid="f042">some other text</value> </Doc> > > Copy and pasting so we have 3000 lines. The exact way in which you created this file is probably very important. If you did it by naive copy-and-paste, then it's likely that the duplicate values in the file are very close to each other and that the distance between duplicates is consistently small. This means that the pattern match="value[.=preceding::value]" is not nearly as expensive to evaluate as one might expect: it doesn't involve a search back to the beginning of the file, only to the nearest duplicate value. I was initially surprised that introducing keys to this example made so much difference, given that you left this match pattern unchanged; but the explanation for this would lie in the predictable data distribution. By contrast, this code > <xsl:attribute name="xrefid" > select="preceding::value[.=current()][last()]/@oid"/> is looking for the first (i.e. most distant) duplicate, which will in general be a long search. I suspect that with most real-life data, evaluating the pattern would be much more expensive than it is with this synthetic data, and therefore introducing the key would have less impact. >I have a feeling that optimization in EE is not for "preceding axis" problems, but for "standard" transformations of huge input files? Saxon-EE has many optimizations, but won't optimize this particular construct. The reason is that the sequence to which the predicate is applied (preceding::value) is only searched once; it's not worth building an index to support a search that is only done once. I'm always on the look-out for new optimization opportunities (though they have to occur in constructs that are frequent enough to justify the extra complexity). I think it would be potentially possible to extend the indexing optimization to this case by recognizing that one can rewrite preceding::value[X=Y] as //value[X=Y][.<<current()], which would then be rewritten to use an index. In fact, I suspect that rewriting preceding::X as //X[.<<current()] might well be a worthwhile rewrite even without the indexing, as it would take advantage of the automatic indexing of //X which occurs even with Saxon-HE: though it needs care because the "natural" order of the results is different. It's worth noting that your rewrite of select="preceding::value[.=current()][last()]/@oid" by select="key('value-content', .)[1]" works only because you know (from the semantics of the match pattern for the template) that the context item cannot be the first in its key-group. Regards, Michael Kay http://www.saxonica.com/ http://twitter.com/michaelhkay
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Benefits of using xsl:key, Jesper Tverskov | Thread | Re: [xsl] Benefits of using xsl:key, Jesper Tverskov |
Re: [xsl] generating unique ids, Vasu Chakkera | Date | Re: [xsl] Benefits of using xsl:key, Jesper Tverskov |
Month |