Subject: RE: [xsl] Static Validation of XSL Transformations From: Anders Møller <amoeller@xxxxxxxx> Date: Fri, 09 Mar 2007 10:31:01 +0100 |
From: Anders Mxller <amoeller@xxxxxxxx> >Given an XSLT 2.0 stylesheet, S, and two schemas, D_in and D_out, the >tool is able to check statically that all output of S at runtime is >valid according to D_out assuming that the input is valid according to >D_in. Additionally, the tool produces a flow graph of S. Schemas are >written in either DTD, XML Schema, or Restricted RELAX NG. >A research paper describing the analysis is also available from the web >site.
That was very interesting.
Does it cope with doc/document/collection functions? I couldn't work out from your paper how it would do that.
An interesting (and very thorough) paper.
Like Colin, I was wondering as I read it how you deal with multiple = input documents (and schemas). And indeed, multiple output documents. One of = the problems is that when you look at a template rule in isolation, you = really don't know whether it is writing part of the final result tree, or some intermediate temporary tree whose structure is statically unknown.
Compared with Saxon's static analysis and type checking, you seem to be making a lot of what one might call "95% guesses". For example, you seem = to be assuming that if you see <xsl:template match=3D"x">, and there is an element x in the schema, then the actual element is going to be a valid instance of that type. (What if there's more than one x in the schema, a global one and a local one?) Saxon doesn't make this assumption; if you = want this kind of checking, you have to write match=3D"schema-element(x)". Of course the assumption is likely to be right 95% of the time, and for a "lint" kind of tool that's fine, but I don't feel it's appropriate for a production compiler. (We had a long debate about this at a W3C meeting = under the heading of "the assumption of validity").
Clearly the more complex the stylesheet becomes, in terms of handling multiple inputs and temporary trees, the more likely it is that there = will be data around that for some good reason does NOT conform to the input schema.
The other technique that you're using which Saxon doesn't currently use = is to analyze across apply-templates calls. I think there's probably quite = a lot of mileage in doing this. In fact Saxon doesn't even analyze across call-template or function calls unless you actually declare the type of = the result of the target template or function.
One thing that's interesting about the paper, I think, is to see how = much can be done simply with knowledge of the input and output schemas, = without any other changes to the stylesheet to incorporate type declarations and = to explicitly invoke validation. There's a lot to be said for this, because XSLT programmers take a lot of re-educating to declare types of = variables and parameters and the more you can do in the absence of such = declarations, the better. I've been moving in similar directions with Saxon in some of = the optimizations that are done, for example trying to infer when
<xsl:variable name=3D"x"><xsl:value-of select=3D"EXP"/></xsl:variable>
can be safely rewritten as
<xsl:variable name=3D"x" select=3D"EXP"/>
by analyzing the expressions in which $x is used. Of course, when you're doing optimization a 95% guess certainly isn't good enough, you need to = be absolutely sound. (Though you can use guesses to decide between = different evaluation strategies, of course, provided that the strategy still works = if the guess is wrong.)
It's interesting to note that you ask the user for two pieces of information: the schema for the input document and the schema for the result. In XSLT 2.0 we don't actually provide syntax to allow the user = to declare the schema for the input document. We tried to tackle that a = number of times but never quite found a solution that worked, but I'm really = not quite sure why not! It's partly of course that a complex stylesheet can = have multiple entry points and can be designed to do more than one job = depending on the entry point that you choose. But again, that's the 5% case rather than the 95%.
-- Anders Moeller amoeller@xxxxxxxx http://www.brics.dk/~amoeller
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] date calculations, Michael Kay | Thread | [xsl] Fwd: Removing blank lines in , Greg Groves |
Re: [xsl] Schema-aware validation o, Jesper Tverskov | Date | Re: [xsl] Schema-aware validation o, Andrew Welch |
Month |