Subject: RE: [xsl] Latest XSLTMark benchmark From: "Michael Kay" <mhkay@xxxxxxxxxxxx> Date: Mon, 2 Apr 2001 11:59:33 +0100 |
> > > I have just had a look at the Saxon driver and think you > are correct. This > > also appear to have been true in the earlier (1_2_1) > release as well. It was > > clearly intended that the loadInput() call should actually > load the input as > > in the other drivers not just open an input stream. > > It was there in version 1.1.1, as well. Everyone, including > Michael Kay, missed > it -- the disagreement between the documentation, the code in > the other drivers, > and the Saxon driver. The intention (which may not be > practical) was always to > measure just the XSLT transformation. I missed it because the intention certainly wasn't clear. In fact, the documentation refers anyone who wants to write a driver to the supplied driver for xt, which builds the source document tree inside the loop, not outside it. I do think that the benchmark should be measuring parsing plus transformation plus serialization, because that is the most typical usage scenario, and because if you don't measure that, a processor that optimizes parsing or serialization based on knowledge of the stylesheet gets no credit for it. For example, a processor might be able to do faster serialization if it knows that neither the source document nor the stylesheet contains any Unicode characters outside the target encoding, or it might be able to use stylesheet information to achieve faster parsing (perhaps avoiding entity expansion in parts of the document that are never accessed, for example, or perhaps not parsing at all sections of the document beyond those required by the transformation). The most likely situation that affects current processors, however, is whitespace stripping. There are basically three ways to do whitespace stripping: do it while building the tree, modify the tree after building it, or leave the whitespace on the tree but ensure that it has no effect. (There is a fourth way, which is to decide not to conform to the standard.) The first approach is by far the most efficient, but by insisting that the tree is built without any knowledge of the stylesheet you are effectively ruling it out. In Saxon's case this will force the second approach, which is far more expensive. There is a trade-off between time taken to build the tree and time taken to do the transformation. Saxon is trying to minimize the sum of these two activities. The Saxon "tinytree" model deliberately reduces the time spent building the tree at the expense of the time spent navigating it, based on the observation that the average number of visits made by a stylesheet to each source node is often about one. Your proposal will encourage implementors to spend more time building the tree in order to spend less time transforming it, which is not necessarily the right design approach for "real life". The current approach is also open to "cheating". For example, it would be quite legitimate for a processor to cache the index structures used to implement keys, so that if the same stylesheet is applied twice to the same source document, the indexes do not need to be rebuilt. Implementing such a cache will boost a processor's rating in this benchmark far more than the technique actually warrants in real life. Mike Kay Software AG > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Latest XSLTMark benchmark, Kevin Jones | Thread | RE: [xsl] Latest XSLTMark benchmark, Eugene Kuznetsov |
RE: [xsl] XSL Namespaces - confused, Chris Bayes | Date | Re: [xsl] XSL Namespaces - confused, David Carlisle |
Month |