RE: [xsl] Static Validation of XSL Transformations

Subject: RE: [xsl] Static Validation of XSL Transformations
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 6 Mar 2007 22:44:31 -0000
An interesting (and very thorough) paper.

Like Colin, I was wondering as I read it how you deal with multiple input
documents (and schemas). And indeed, multiple output documents. One of the
problems is that when you look at a template rule in isolation, you really
don't know whether it is writing part of the final result tree, or some
intermediate temporary tree whose structure is statically unknown.

Compared with Saxon's static analysis and type checking, you seem to be
making a lot of what one might call "95% guesses". For example, you seem to
be assuming that if you see <xsl:template match="x">, and there is an
element x in the schema, then the actual element is going to be a valid
instance of that type. (What if there's more than one x in the schema, a
global one and a local one?) Saxon doesn't make this assumption; if you want
this kind of checking, you have to write match="schema-element(x)". Of
course the assumption is likely to be right 95% of the time, and for a
"lint" kind of tool that's fine, but I don't feel it's appropriate for a
production compiler. (We had a long debate about this at a W3C meeting under
the heading of "the assumption of validity").

Clearly the more complex the stylesheet becomes, in terms of handling
multiple inputs and temporary trees, the more likely it is that there will
be data around that for some good reason does NOT conform to the input

The other technique that you're using which Saxon doesn't currently use is
to analyze across apply-templates calls. I think there's probably quite a
lot of mileage in doing this. In fact Saxon doesn't even analyze across
call-template or function calls unless you actually declare the type of the
result of the target template or function.

One thing that's interesting about the paper, I think, is to see how much
can be done simply with knowledge of the input and output schemas, without
any other changes to the stylesheet to incorporate type declarations and to
explicitly invoke validation. There's a lot to be said for this, because
XSLT programmers take a lot of re-educating to declare types of variables
and parameters and the more you can do in the absence of such declarations,
the better. I've been moving in similar directions with Saxon in some of the
optimizations that are done, for example trying to infer when

<xsl:variable name="x"><xsl:value-of select="EXP"/></xsl:variable>

can be safely rewritten as

<xsl:variable name="x" select="EXP"/>

by analyzing the expressions in which $x is used. Of course, when you're
doing optimization a 95% guess certainly isn't good enough, you need to be
absolutely sound. (Though you can use guesses to decide between different
evaluation strategies, of course, provided that the strategy still works if
the guess is wrong.)

It's interesting to note that you ask the user for two pieces of
information: the schema for the input document and the schema for the
result. In XSLT 2.0 we don't actually provide syntax to allow the user to
declare the schema for the input document. We tried to tackle that a number
of times but never quite found a solution that worked, but I'm really not
quite sure why not! It's partly of course that a complex stylesheet can have
multiple entry points and can be designed to do more than one job depending
on the entry point that you choose. But again, that's the 5% case rather
than the 95%.

Michael Kay

> -----Original Message-----
> From: Anders Mxller [mailto:amoeller@xxxxxxxx]
> Sent: 06 March 2007 12:54
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Static Validation of XSL Transformations
> An online version of our XSLT analysis tool is now available at
> Given an XSLT 2.0 stylesheet, S, and two schemas, D_in and
> D_out, the tool is able to check statically that all output
> of S at runtime is valid according to D_out assuming that the
> input is valid according to D_in. Additionally, the tool
> produces a flow graph of S. Schemas are written in either
> DTD, XML Schema, or Restricted RELAX NG.
> A research paper describing the analysis is also available
> from the web site.
> --
> Anders Moeller
> amoeller@xxxxxxxx

Current Thread