RE: [xsl] Schema-aware validation of XHTML result-document

Subject: RE: [xsl] Schema-aware validation of XHTML result-document
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 9 Mar 2007 10:22:53 -0000
There are some good points here about what can and can't be achieved with
schema-awareness. But there seem to be one or two observations that result
from your pressing the wrong buttons - always a hazard when you try out a
new piece of technology.

1. You have the rather curious statement:

"In Saxon the input document must also be XHTML or the schema of the input
document must also be imported or the -vlax parameter must be used at the
command line or the -val parameter must not be used in order to turn input
validation off."

And later in section 5 you say "we must ... turn the validation of input
documents off". But validation of input documents is off by default, so I
think this gives a wrong impression. What you are really saying is: if you
ask for validation you must supply a schema. (Note also there are several
other ways you can provide it, for example using schemaLocation in the
source document or via the Java API).

2. You say:

"In Saxon we must use a parameter at the command line to treat errors more
like warnings. Now the error message is useless, "one or more errors found",
and nothing is highlighted in the stylesheet."

Basically I think this must be a case of you pressing the wrong buttons.

(a) for "must use" read "can also use".

(b) Saxon doesn't have a GUI, so it isn't going to highlight anything in the
stylesheet: that's the job of the IDE's that integrate Saxon, such as Stylus
Studio and Oxygen. Saxon does however produce detailed error messages about
where the errors appear. By default these are written to the standard error
stream, and if you didn't see these messages then it's because you either
directed them somewhere else, or you somehow didn't see the contents of the
standard error stream. I've given some examples of how the errors should
appear on the console in a footnote this message.

3. You say:

"Saxon has also compile-time validation, that is, the errors are reported
right away, and you don't need to start the transformation process. To
trigger it you must use the validation attribute in all the top-elements
generated by templates or the xsl:validation attribute if the top-elements
are generated the literal way."

Yes: this is a limitation of the approach. Clearly Saxon can't issue an
error message if your code is correct according to the language spec. I
think it's an inherent aspect of the very dynamic nature of the template
mechanism that you can't be sure at compile time that a template is
generating invalid output unless it declares the type of output it is
designed to generate. There are some cases where Saxon gets round this by
generating compile time warnings if the code looks implausible, even though
it might be correct according to the spec. I think this might be a way
forward to reduce this problem.

4. You say:

"If namespace declarations other than for XHTML are copied to the
result-document it becomes not valid XHTML 1.0. This is not nice when both
processors have just reported "no validation errors"."

Agreed - another usability problem. You're presumably aware of the reason:
to be "valid XHTML 1.0" you need to do more than conform to the XHTML
schema, you also need to get your namespace prefixes right, and schema
validity offers no guarantee of that. Although there's no support for this
in the XSLT language spec, I think it would be possible for products like
Saxon to offer users a bit more help here, by treating XHTML output as a
special case.

5. In your example in section 5, you say "Note the space="fixed" in the
style element making the output invalid.". Actually it is space="preserve".
This attribute has been added to the output by the schema validator because
the schema defines a fixed/default value for this attribute. Yes: it should
be xml:space="preserve": a bug indeed. Please feel free to use the regular
reporting channels when you find a bug, I think you will find they work very
effectively.

You say "and note all the colspan="1" and rowspan="1" junk", but don't
really explain what causes this. The schema defines <xs:attribute
name="colspan" default="1" type="Number"/>, so validation is going to insert
the default value (just as DTD validation would). One advantage of schemas
over DTDs here is that's it's much easier to produce a version of the schema
that removes the fixed and default values, to avoid this effect happening if
you don't want it.

You complain about this again in section 6 "Saxon insists in copying all
that dirt out of the schema and into the result-document". Sorry, but it's
required for conformance with the specs. A product that validates against a
schema without expanding the fixed and default values defined in the schema
is not conformant. If the validation were happening on the input side, your
stylesheet would be entitled to rely on seeing the default values and would
break if they weren't there. If you don't want this to happen, define a
schema that doesn't include the fixed and default values. 

Footnote
========

Here are some examples of error messages:

(i) a validity problem with an input document:

java net.sf.saxon.Transform -im single-doc -val -o c:\temp\out.html
conformance.xml render-page2.xsl

Validation error on line 22 column 89 of
file:/c:/MyJava/doc/saxon8/changes.xml:
  XTTE1510: The content model for element <li> does not allow character
content (See
  http://www.w3.org/TR/xmlschema-1/#cvc-complex-type clause 2.3)
Error on line 346 of file:/c:/MyJava/doc/saxon8/render-page2.xsl:
  FODC0005: ValidationException: The content model for element <li> does not
allow character content

(2 messages, one giving the location in a source document, the other the
location in the stylesheet that caused this source document to be read)

(ii) a validity problem with the output that can be detected at compile
time:

Error on line 20 of file:/c:/demo2/queries/err-sa-xslt004.xsl:
  XTTE1510: Element h:tittle is not permitted in the content model of the
complex type of element head
Failed to compile stylesheet. 1 error detected.

Note how the error message points to the place in the stylesheet where the
error occurs. The offending line is this:

<h:html xsl:validation="strict">
    <h:head><h:tittle>A list of functions</h:tittle></h:head>
    <h:body>

(iii) a run-time validity problem with the output:

Validation error on line 38 of file:/c:/demo2/queries/err-sa-xslt004.xsl:
  XTTE1510: In content of element <body>: The content model does not allow
element <div> to
  appear here. Expected one of: {http://www.w3.org/1999/xhtml}blockquote,
  {http://www.w3.org/1999/xhtml}dfn, {http://www.w3.org/1999/xhtml}br,
  {http://www.w3.org/1999/xhtml}h6, {http://www.w3.org/1999/xhtml}p,
  {http://www.w3.org/1999/xhtml}sup, {http://www.w3.org/1999/xhtml}hr,
  [other possibilities snipped]
  (See http://www.w3.org/TR/xmlschema-1/#cvc-complex-type
  clause 2.4)
Transformation failed: Run-time errors were reported

The error message here points to a line in the stylesheet that does:

  <xsl:copy-of select="*"/>

- the error arises because the <div> element being copied is in the wrong
namespace.

(iv) Same as (ii), but with the -vw (validation warnings) option on the
command line:

Same messages on the console, but this time the invalid output HTML is
written to the requested destination, with embedded comments. The relevant
section of the output file looks like this:

      <h:h1>fn:collection() =&gt; node()*</h:h1>
      <!--
VALIDATION ERROR: In content of element <body>: The content model does not
allow element <div> to appear here. Expected one of:
{http://www.w3.org/1999/xhtml}blockquote, {http://www.w3.org/1999/xhtml}dfn,
{http://www.w3.org/1999/xhtml}br, [list snipped]
{http://www.w3.org/1999/xhtml}samp
-->
      <div xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";


Anyway, thanks for the feedback. It's good to see schema-aware processing
getting some discussion. There are real benefits, but as you point out there
are also limitations and things to learn about what works well and what
doesn't. There are also opportunities for products to go beyond the spec -
checking for XHTML validity being an obvious example.

Michael Kay
Saxonica Limited

Current Thread