RE: [xsl] XSLT 2.0: Schema-aware processor: What are the compelling advantages over a non-SA processor?

Subject: RE: [xsl] XSLT 2.0: Schema-aware processor: What are the compelling advantages over a non-SA processor?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 18 Jul 2007 09:18:22 +0100
> To continue on my journey of XSLT enlightenment, I'd like to 
> know what the compelling reasons are to use an XSLT 2.0 
> schema-aware processor.

Start with this article:
http://www.stylusstudio.com/schema_aware.html

It's basically all about good software engineering. In any program, if you
make assertions about your inputs and outputs, whether at the level of the
whole program or at a finer-grained level of individual functions, then
readers of your code (including you) have a better chance of understanding
what the code is supposed to do, and if the assertions are machine-readable,
then there's a good chance you will get better and earlier diagnostics when
you make mistakes. This gives you a faster coding/debugging cycle to get
your code working, and given that no-one ever tests code perfectly, it
increases the chance that your code is bug-free when it goes into live
running, and remains bug-free when maintenance programmers make changes. In
many cases it makes the difference between getting "no output" (the classic
XSLT debugging nightmare, especially when dealing with complex industry
schemas like FpML), and getting a compile-time error message telling you
where your path expression is wrong.

For XSLT, referring to a schema is the natural way to make these assertions,
especially if schemas for your input and output vocabularies already exist.

There's also a spin-off in terms of performance. For some people this is the
primary goal, as far as I'm concerned robustness comes first, and that's
what I concentrated on first with Saxon, but the two objectives don't in any
way conflict. The performance argument is essentially that the more the XSLT
compiler knows about the nature of the input (and to a lesser degree, the
output), the more it can compile access paths that have those assumptions
built in. To take a simple example, if you write //title, and the processor
knows there can only be one title element and it is within the document
header, then it can avoid a whole-document search. At a finer-grained level
there are many cases where knowing at compile-time say that something is an
integer enables decisions to be made at compile-time that would otherwise be
made at run-time. (In fact you can often get these benefits without going
all the way to schema-awareness simply by making a habit of declaring the
types of your variables and parameters.)

There is a discipline involved in using schema-awareness. It's tempting to
write the minimum amount of code needed to get things working, which means
you'll be tempted to leave out all those pesky type declarations. But the
more complex the project, and the longer-lived the application that you are
writing, the bigger the payback from "doing it properly" from the start.

> Is Dr. Kay's/Saxonica'a the only one out there?

Altova claim that their XSLT 2.0 product is schema-aware. (I've been able to
get output validation working but not input validation. If anyone can tell
me where I went wrong, please let me know!)

Michael Kay
http://www.saxonica.com/

Current Thread