Subject: RE: [xsl] Schema-aware validation of XHTML result-document From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Fri, 9 Mar 2007 10:22:53 -0000 |
There are some good points here about what can and can't be achieved with schema-awareness. But there seem to be one or two observations that result from your pressing the wrong buttons - always a hazard when you try out a new piece of technology. 1. You have the rather curious statement: "In Saxon the input document must also be XHTML or the schema of the input document must also be imported or the -vlax parameter must be used at the command line or the -val parameter must not be used in order to turn input validation off." And later in section 5 you say "we must ... turn the validation of input documents off". But validation of input documents is off by default, so I think this gives a wrong impression. What you are really saying is: if you ask for validation you must supply a schema. (Note also there are several other ways you can provide it, for example using schemaLocation in the source document or via the Java API). 2. You say: "In Saxon we must use a parameter at the command line to treat errors more like warnings. Now the error message is useless, "one or more errors found", and nothing is highlighted in the stylesheet." Basically I think this must be a case of you pressing the wrong buttons. (a) for "must use" read "can also use". (b) Saxon doesn't have a GUI, so it isn't going to highlight anything in the stylesheet: that's the job of the IDE's that integrate Saxon, such as Stylus Studio and Oxygen. Saxon does however produce detailed error messages about where the errors appear. By default these are written to the standard error stream, and if you didn't see these messages then it's because you either directed them somewhere else, or you somehow didn't see the contents of the standard error stream. I've given some examples of how the errors should appear on the console in a footnote this message. 3. You say: "Saxon has also compile-time validation, that is, the errors are reported right away, and you don't need to start the transformation process. To trigger it you must use the validation attribute in all the top-elements generated by templates or the xsl:validation attribute if the top-elements are generated the literal way." Yes: this is a limitation of the approach. Clearly Saxon can't issue an error message if your code is correct according to the language spec. I think it's an inherent aspect of the very dynamic nature of the template mechanism that you can't be sure at compile time that a template is generating invalid output unless it declares the type of output it is designed to generate. There are some cases where Saxon gets round this by generating compile time warnings if the code looks implausible, even though it might be correct according to the spec. I think this might be a way forward to reduce this problem. 4. You say: "If namespace declarations other than for XHTML are copied to the result-document it becomes not valid XHTML 1.0. This is not nice when both processors have just reported "no validation errors"." Agreed - another usability problem. You're presumably aware of the reason: to be "valid XHTML 1.0" you need to do more than conform to the XHTML schema, you also need to get your namespace prefixes right, and schema validity offers no guarantee of that. Although there's no support for this in the XSLT language spec, I think it would be possible for products like Saxon to offer users a bit more help here, by treating XHTML output as a special case. 5. In your example in section 5, you say "Note the space="fixed" in the style element making the output invalid.". Actually it is space="preserve". This attribute has been added to the output by the schema validator because the schema defines a fixed/default value for this attribute. Yes: it should be xml:space="preserve": a bug indeed. Please feel free to use the regular reporting channels when you find a bug, I think you will find they work very effectively. You say "and note all the colspan="1" and rowspan="1" junk", but don't really explain what causes this. The schema defines <xs:attribute name="colspan" default="1" type="Number"/>, so validation is going to insert the default value (just as DTD validation would). One advantage of schemas over DTDs here is that's it's much easier to produce a version of the schema that removes the fixed and default values, to avoid this effect happening if you don't want it. You complain about this again in section 6 "Saxon insists in copying all that dirt out of the schema and into the result-document". Sorry, but it's required for conformance with the specs. A product that validates against a schema without expanding the fixed and default values defined in the schema is not conformant. If the validation were happening on the input side, your stylesheet would be entitled to rely on seeing the default values and would break if they weren't there. If you don't want this to happen, define a schema that doesn't include the fixed and default values. Footnote ======== Here are some examples of error messages: (i) a validity problem with an input document: java net.sf.saxon.Transform -im single-doc -val -o c:\temp\out.html conformance.xml render-page2.xsl Validation error on line 22 column 89 of file:/c:/MyJava/doc/saxon8/changes.xml: XTTE1510: The content model for element <li> does not allow character content (See http://www.w3.org/TR/xmlschema-1/#cvc-complex-type clause 2.3) Error on line 346 of file:/c:/MyJava/doc/saxon8/render-page2.xsl: FODC0005: ValidationException: The content model for element <li> does not allow character content (2 messages, one giving the location in a source document, the other the location in the stylesheet that caused this source document to be read) (ii) a validity problem with the output that can be detected at compile time: Error on line 20 of file:/c:/demo2/queries/err-sa-xslt004.xsl: XTTE1510: Element h:tittle is not permitted in the content model of the complex type of element head Failed to compile stylesheet. 1 error detected. Note how the error message points to the place in the stylesheet where the error occurs. The offending line is this: <h:html xsl:validation="strict"> <h:head><h:tittle>A list of functions</h:tittle></h:head> <h:body> (iii) a run-time validity problem with the output: Validation error on line 38 of file:/c:/demo2/queries/err-sa-xslt004.xsl: XTTE1510: In content of element <body>: The content model does not allow element <div> to appear here. Expected one of: {http://www.w3.org/1999/xhtml}blockquote, {http://www.w3.org/1999/xhtml}dfn, {http://www.w3.org/1999/xhtml}br, {http://www.w3.org/1999/xhtml}h6, {http://www.w3.org/1999/xhtml}p, {http://www.w3.org/1999/xhtml}sup, {http://www.w3.org/1999/xhtml}hr, [other possibilities snipped] (See http://www.w3.org/TR/xmlschema-1/#cvc-complex-type clause 2.4) Transformation failed: Run-time errors were reported The error message here points to a line in the stylesheet that does: <xsl:copy-of select="*"/> - the error arises because the <div> element being copied is in the wrong namespace. (iv) Same as (ii), but with the -vw (validation warnings) option on the command line: Same messages on the console, but this time the invalid output HTML is written to the requested destination, with embedded comments. The relevant section of the output file looks like this: <h:h1>fn:collection() => node()*</h:h1> <!-- VALIDATION ERROR: In content of element <body>: The content model does not allow element <div> to appear here. Expected one of: {http://www.w3.org/1999/xhtml}blockquote, {http://www.w3.org/1999/xhtml}dfn, {http://www.w3.org/1999/xhtml}br, [list snipped] {http://www.w3.org/1999/xhtml}samp --> <div xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Anyway, thanks for the feedback. It's good to see schema-aware processing getting some discussion. There are real benefits, but as you point out there are also limitations and things to learn about what works well and what doesn't. There are also opportunities for products to go beyond the spec - checking for XHTML validity being an obvious example. Michael Kay Saxonica Limited
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Schema-aware validation o, Andrew Welch | Thread | Re: [xsl] Schema-aware validation o, Jesper Tverskov |
Re: [xsl] Schema-aware validation o, Andrew Welch | Date | Re: [xsl] XSLT Output contains the , Abel Braaksma |
Month |