Schemas in XSLT 2.0 (Was: Re: [xsl] keys and idrefs - XSLT2 request?)

Subject: Schemas in XSLT 2.0 (Was: Re: [xsl] keys and idrefs - XSLT2 request?)
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Wed, 10 Oct 2001 16:12:12 +0100
David C. wrote:
> using keys in XSLT turns out to be a lot more useful than using id()
> not just because they are more general but importantly because a
> large part of xslt processing is done with non validating parsers
> that might or might not read any DTD.

I think that there are two issues here:

  1. schema/DTD support
  2. schema/DTD availability/reliability

What David's bringing out is the fact that not all XSLT 1.0 processors
use validating parsers all the time. When used with XSLT, the point of
the validation of these parsers is not primarily that they check that
the document adheres to the DTD but rather that the parser makes the
XSLT processor aware that particular attributes are ID attributes and
presents default/fixed attributes as if they were part of the original
document.

As we move into schema-awareness in XSLT/XPath 2.0, we're expecting
not only that XSLT processors will have access to schema-validating
parsers that *validate* the XML document, but also that they make the
post-schema validation infoset (PSVI) available to the XSLT processor.
It's a pretty tall order to put together a XML Schema validator, let
alone one that makes the PSVI available through some kind of API (it
looks like DOM3 will offer one with Abstract Schemas - I don't know if
there's an equivalent SAX interface being worked on).

So, the first question is whether XSLT 2.0 should mandate support of
XML Schema within XSLT processors (i.e. you've got to be able to
validate against XML Schema in order to be a conformant XSLT 2.0
processor). I think the answer is that it shouldn't, for two reasons:

  a. It means that XSLT 2.0 processors developers will either have to
     write their own schema-validating parsers (which is a massive
     effort) or rely on the few open-source schema-validating parsers
     that are available at the moment (Xerces being the obvious one).
     I think this will severely limit the range of implementations
     supporting XSLT 2.0 (particularly in different languages).

  b. It means that XSLT 2.0 processors will be larger and less
     efficient than they are currently, both because of the overhead
     of supporting XML Schema validation and, leading from the first
     reason, because of the lack of competition between the few
     implementations that will be available.

It needn't necessary be an all-or-nothing thing. Just supporting XML
Schema - Datatypes would give quite a lot of power without a great
deal of implementation effort (certainly not as much as supporting the
entirety of XML Schema).

For the continued ubiquity of XSLT, I would rather see a large range
of XSLT processors supporting different markets - big processors
offering the power of schema-validation to those who want it, smaller
processors targetting quick transformations. To get that, I think
validation according to XML Schemas should not be obligitory under
XSLT 2.0.

The second issue is how to provide enough support so that it's
possible to write XSLT/XPath that doesn't rely on a schema or DTD
being present in order to achieve a particular result. I was quite
reassured to see the Functions and Operators document offer lots of
casting/constructor operators/functions that imply that you could get
the same behaviour with the same stylesheet whether the schema is
there or not.

Another thought along these lines is how the schema is going to be
made available to the XSLT processor (or validating parser). I would
like to have the *XSLT stylesheet* point to the schema that should be
used with a particular document, overriding any pointer from the
*instance document*. I think it's generally recognised that having a
document provider assert that a document adheres to the DTD or schema
it adheres to is pretty meaningless - you need to know whether it
adheres to the DTD/schema that the stylesheet *expects* it to adhere
to.

What's more, it could be really useful to have different stylesheets
using different schemas with the same document, to give the minimal
validation that they require, for example, or to subtly reclassify
particular element/attribute types for different purposes. (I'm
thinking here about the support for phased validation in Schematron
and how that might apply in XSLT.) Again, having a stylesheet assert
to which schema it expects a document to adhere would support this.

Anyway, as David says, hopefully the XPath 2.0 and XSLT 2.0 WDs will
make the intentions of the WG(s) on this topic clearer and allow us to
make more focussed comments.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread