Subject: RE: [xsl] Static Validation of XSL Transformations From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Tue, 6 Mar 2007 22:44:31 -0000 |
An interesting (and very thorough) paper. Like Colin, I was wondering as I read it how you deal with multiple input documents (and schemas). And indeed, multiple output documents. One of the problems is that when you look at a template rule in isolation, you really don't know whether it is writing part of the final result tree, or some intermediate temporary tree whose structure is statically unknown. Compared with Saxon's static analysis and type checking, you seem to be making a lot of what one might call "95% guesses". For example, you seem to be assuming that if you see <xsl:template match="x">, and there is an element x in the schema, then the actual element is going to be a valid instance of that type. (What if there's more than one x in the schema, a global one and a local one?) Saxon doesn't make this assumption; if you want this kind of checking, you have to write match="schema-element(x)". Of course the assumption is likely to be right 95% of the time, and for a "lint" kind of tool that's fine, but I don't feel it's appropriate for a production compiler. (We had a long debate about this at a W3C meeting under the heading of "the assumption of validity"). Clearly the more complex the stylesheet becomes, in terms of handling multiple inputs and temporary trees, the more likely it is that there will be data around that for some good reason does NOT conform to the input schema. The other technique that you're using which Saxon doesn't currently use is to analyze across apply-templates calls. I think there's probably quite a lot of mileage in doing this. In fact Saxon doesn't even analyze across call-template or function calls unless you actually declare the type of the result of the target template or function. One thing that's interesting about the paper, I think, is to see how much can be done simply with knowledge of the input and output schemas, without any other changes to the stylesheet to incorporate type declarations and to explicitly invoke validation. There's a lot to be said for this, because XSLT programmers take a lot of re-educating to declare types of variables and parameters and the more you can do in the absence of such declarations, the better. I've been moving in similar directions with Saxon in some of the optimizations that are done, for example trying to infer when <xsl:variable name="x"><xsl:value-of select="EXP"/></xsl:variable> can be safely rewritten as <xsl:variable name="x" select="EXP"/> by analyzing the expressions in which $x is used. Of course, when you're doing optimization a 95% guess certainly isn't good enough, you need to be absolutely sound. (Though you can use guesses to decide between different evaluation strategies, of course, provided that the strategy still works if the guess is wrong.) It's interesting to note that you ask the user for two pieces of information: the schema for the input document and the schema for the result. In XSLT 2.0 we don't actually provide syntax to allow the user to declare the schema for the input document. We tried to tackle that a number of times but never quite found a solution that worked, but I'm really not quite sure why not! It's partly of course that a complex stylesheet can have multiple entry points and can be designed to do more than one job depending on the entry point that you choose. But again, that's the 5% case rather than the 95%. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Anders Mxller [mailto:amoeller@xxxxxxxx] > Sent: 06 March 2007 12:54 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] Static Validation of XSL Transformations > > An online version of our XSLT analysis tool is now available at > > http://www.brics.dk/XSLV/ > > Given an XSLT 2.0 stylesheet, S, and two schemas, D_in and > D_out, the tool is able to check statically that all output > of S at runtime is valid according to D_out assuming that the > input is valid according to D_in. Additionally, the tool > produces a flow graph of S. Schemas are written in either > DTD, XML Schema, or Restricted RELAX NG. > A research paper describing the analysis is also available > from the web site. > > -- > Anders Moeller > amoeller@xxxxxxxx > http://www.brics.dk/~amoeller
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Static Validation of XSL , Colin Adams | Thread | [xsl] date calculations, Geert Bormans |
Re: [xsl] sorting like in order by , Gannon Dick | Date | [xsl] Looping a node in XSLT, Senthilkumaravelan K |
Month |