Re: [xsl] XSLT 2 and Invalid documents

Subject: Re: [xsl] XSLT 2 and Invalid documents
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Sat, 15 Feb 2003 17:52:41 +0000
Hi Elliotte,

> What happens in XSLT 2 when a schema is applied to assign types and
> the document is invalid, either a little or a lot? Does the
> transformation fail? Or does it proceed with whatever information it
> has been able to deduce from the document and its schema? What if
> the processor has been able to assign types to some elements and not
> to others? And in general, how does the calculation of a PSVI depend
> on validity?

These are good questions. XML Schema and the PSVI supports the notion
of partial validity, so in your example:

> Practical example:
>
> <product>
>    <quantity>1</quantity>
> </product>
>
> Suppose product is declared to have two required child elements,
> name and quantity. name is declared to be a string. quantity is
> declared to be an int. When the processor reads the above document,
> will it still assign the type int to the quantity element?

I believe that the <quantity> element would be valid, with a type of
xs:int, but the <product> element would be invalid (it's a little
unclear from the XML Schema spec, to say the least...). I really can't
tell what should happen with:

> <product>
>    <quantity>1</quantity>
>    <quantity>3</quantity>
>    <quantity>3.4</quantity>
>    <quantity>Hello</quantity>
> </product>

(I suggest asking on the xmlschema-dev@xxxxxx mailing list.) With:

> <products>
>    <quantity>1</quantity>
> </products>

assuming lax validation, the <products> element would be assumed to
have the type xs:anyType in the absence of a declaration saying
otherwise, and it would then depend on whether the <quantity> element
was declared globally or locally. If globally, it would be validated
and assigned the type xs:int; if locally, it would not be validated
and would be assigned the type xs:anyType.

Within XPath 2.0 the relevant issue is #96 of the Formal Semantics [1]
which was resolved as follows:

  Resolution: XQuery supports schema-less document, and valid
  documents. It does not support invalid document, I.e., document with
  a schema for which validation fails. It can support those documents
  in a well-formed manner.

Thus in your example, since the <product> element is invalid, the
partially valid PSVI cannot be loaded into an XPath 2.0 data model
as-is. To use the document, it needs to be loaded as if it were just
well-formed, which means that <quantity> (and <product>) are assigned
the type xs:anyType.

As far as XSLT 2.0 processors go, the construction of the data model
and what happens if that fails because of the invalidity of the
document is really out of scope. XSLT 2.0 deals with the
transformation of a node tree conforming to the data model into a
result tree, and that result tree's serialisation. It doesn't talk
about the construction of the node tree from an XML document.

Personally, I think that XSLT 2.0 processors should give users the
option of whether to ignore any xsi:schemaLocation attributes in a
source XML document or not. In the majority of situations, I would
imagine that any validation should be controlled from within the
stylesheet, where you can control both the schema against which the
document is validated and what happens if it fails.

Either way, though, you cannot use invalid (or partially valid)
documents in XSLT 2.0.

Cheers,

Jeni

[1] http://www.w3.org/TR/query-semantics/#Issue-0096

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread