RE: [xsl] James Clark on Schema

Subject: RE: [xsl] James Clark on Schema
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Thu, 6 Jun 2002 09:22:19 +0100
David Carlisle:

> I have never argued that XPath2 should be based on Relax NG, 
> what I have argued is that it shouldn't be based so closely 
> on W3C Schema. In particular, concentrating for now on the 
> simple types rather than complex types (element structure) it 
> should not have accumulated the mass of numeric, date and 
> string types. They make no sense to be hardwired into a query 
> language aimed at generic XML documents. No fixed set of 
> random types will ever be of any real use in an XML context 
> as most of the documents are generated for reasons 
> unconnected with XQuery/XPath. An XML Query language has to 
> be able to cope with whatever's out there it can't just 
> invent a world view and pretend that all documents will 
> conform.  Having a float and an integer type (already an 
> extension to XPath 1) makes sense; having byte and friends is 
> just slavish devotion to the schema spec.

If we start with the premise that the input document has been labelled
with type information by an XML Schema processor, then we have to accept
that the labels (for simple types) may be any of the 19 primitive types
defined in schema, or any system-defined or user-defined subtypes of
those primitive types. So we have to define how the XPath processor will
behave when it encounters those types. This is not "slavish devotion",
it is something we have to do to make the language well-defined. 

If you read carefully you will see that the actual level of support for
the more esoteric types (like gMonthDay) is absolutely minimal:
essentially, the ability to convert between these types and strings, and
to compare them for equality. I don't think we have allowed the
existence of these types to bloat the language at all. (In fact, the
behavior would probably be exactly the same, and the spec no shorter, if
we had said that gMonthDay is treated as if it were unknownSimpleType,
the type you get when there is no Schema.) 

Much the same is true of the built-in types like xsd:short. At present
these do have a degree of support that user-defined types don't have,
namely a built-in constructor, but this is quite likely to change.
> 
> Support for complex types also massively complicates the 
> specification for little to no benefit. If you want to find 
> all children elements that have a parent parent, it is far 
> more natural to use XPath parent/child I see no occasions 
> when it would be more natural (In an Xpath context) to define 
> a schema type for that construct and and then query on the 
> type. 

I agree that the rules for complex type matching at present seem awfully
complicated, and I think we should explore whether there are viable
simplifications, such as dropping support for anonymous types. Such
simplifications might force people to write their schemas in a
particular way to make them suitable for use with XPath, which might not
be a bad thing if we get it right.

I think the principle of matching on types rather than on names is
sound, though. When XML is used in application integration scenarios the
size of the vocabulary can reach tens of thousands of element and
attribute names, but the number of types is typically much smaller. I
think that being able to write template rules that transform a date, or
a national insurance number, regardless of the name of the element or
attribute that contains it, will be a significant benefit for people
managing big integration projects. I've done some consultancy work with
a Software AG client recently where it would definitely have been a
help.

Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread