Subject: Re: [xsl] Schema-aware validation of XHTML result-document From: "Jesper Tverskov" <jesper@xxxxxxxxxxx> Date: Fri, 9 Mar 2007 12:44:11 +0100 |
I have already corrected the typo in the article. Yes articles will also always have "bugs" for ever.
I have not been using the command line but the excellent XML editor Oxygen and its default setup for Saxon SA. In Oxygen's configuration menu for Saxon SA, validation of input document is default.
I must repeat: in Oxygen, Saxon's error messages for both error reporting modes are great and the place of error is highlighted in the stylesheet when I use compile time validation.
When I use runtime validation the error message is only good when the errors are not treated as warnings, but bad in the "warning" mode: just "One or more validation errors were reported". In both modes nothing is highlighted in the stylesheet.
AltovaXML in XMLSpy highlights the error in the stylesheet for runtime validation, but AltovaXML has not compile-time validation being so great in Saxon and which I find more useful.
I understand that XHTML is a special XML-case, but I don't think we disagree on anything here. XSLT processors should be improved to make also XHTML result-documents valid, when they say so.
The final issue is all that junk, default attributes and fixed attributes, copied out of the schema and into the result-document. I hate it like mad when transforming XHTML to XHTML but at least I can easily get rid of it by adding extra templates to my stylesheet.
This is not the case for validation of XHTML result-documents. Here, as Michael suggests, it is necessary to modify the schema. Are you serious? Most of us can easily add templates to our styleheets, but it is not a proper way forward in most use cases to open op schemas and modify them just to do result-document validation.
It might be against the last detail of the spec, but AltovaXML has taken this approach: why not be better than the spec, when it is so wrong, that validation of result-documents become almost a joke, if we follow the spec to the word.
Why not a new "being better than the spec" mode also for Saxon, some parameter to use at the command line?
Cheers, Jesper Tverskov
There are some good points here about what can and can't be achieved with schema-awareness. But there seem to be one or two observations that result from your pressing the wrong buttons - always a hazard when you try out a new piece of technology.
1. You have the rather curious statement:
"In Saxon the input document must also be XHTML or the schema of the input document must also be imported or the -vlax parameter must be used at the command line or the -val parameter must not be used in order to turn input validation off."
And later in section 5 you say "we must ... turn the validation of input documents off". But validation of input documents is off by default, so I think this gives a wrong impression. What you are really saying is: if you ask for validation you must supply a schema. (Note also there are several other ways you can provide it, for example using schemaLocation in the source document or via the Java API).
2. You say:
"In Saxon we must use a parameter at the command line to treat errors more like warnings. Now the error message is useless, "one or more errors found", and nothing is highlighted in the stylesheet."
Basically I think this must be a case of you pressing the wrong buttons.
(a) for "must use" read "can also use".
(b) Saxon doesn't have a GUI, so it isn't going to highlight anything in the stylesheet: that's the job of the IDE's that integrate Saxon, such as Stylus Studio and Oxygen. Saxon does however produce detailed error messages about where the errors appear. By default these are written to the standard error stream, and if you didn't see these messages then it's because you either directed them somewhere else, or you somehow didn't see the contents of the standard error stream. I've given some examples of how the errors should appear on the console in a footnote this message.
3. You say:
"Saxon has also compile-time validation, that is, the errors are reported right away, and you don't need to start the transformation process. To trigger it you must use the validation attribute in all the top-elements generated by templates or the xsl:validation attribute if the top-elements are generated the literal way."
Yes: this is a limitation of the approach. Clearly Saxon can't issue an error message if your code is correct according to the language spec. I think it's an inherent aspect of the very dynamic nature of the template mechanism that you can't be sure at compile time that a template is generating invalid output unless it declares the type of output it is designed to generate. There are some cases where Saxon gets round this by generating compile time warnings if the code looks implausible, even though it might be correct according to the spec. I think this might be a way forward to reduce this problem.
4. You say:
"If namespace declarations other than for XHTML are copied to the result-document it becomes not valid XHTML 1.0. This is not nice when both processors have just reported "no validation errors"."
Agreed - another usability problem. You're presumably aware of the reason: to be "valid XHTML 1.0" you need to do more than conform to the XHTML schema, you also need to get your namespace prefixes right, and schema validity offers no guarantee of that. Although there's no support for this in the XSLT language spec, I think it would be possible for products like Saxon to offer users a bit more help here, by treating XHTML output as a special case.
5. In your example in section 5, you say "Note the space="fixed" in the style element making the output invalid.". Actually it is space="preserve". This attribute has been added to the output by the schema validator because the schema defines a fixed/default value for this attribute. Yes: it should be xml:space="preserve": a bug indeed. Please feel free to use the regular reporting channels when you find a bug, I think you will find they work very effectively.
You say "and note all the colspan="1" and rowspan="1" junk", but don't really explain what causes this. The schema defines <xs:attribute name="colspan" default="1" type="Number"/>, so validation is going to insert the default value (just as DTD validation would). One advantage of schemas over DTDs here is that's it's much easier to produce a version of the schema that removes the fixed and default values, to avoid this effect happening if you don't want it.
You complain about this again in section 6 "Saxon insists in copying all that dirt out of the schema and into the result-document". Sorry, but it's required for conformance with the specs. A product that validates against a schema without expanding the fixed and default values defined in the schema is not conformant. If the validation were happening on the input side, your stylesheet would be entitled to rely on seeing the default values and would break if they weren't there. If you don't want this to happen, define a schema that doesn't include the fixed and default values.
Footnote ========
Here are some examples of error messages:
(i) a validity problem with an input document:
java net.sf.saxon.Transform -im single-doc -val -o c:\temp\out.html conformance.xml render-page2.xsl
Validation error on line 22 column 89 of file:/c:/MyJava/doc/saxon8/changes.xml: XTTE1510: The content model for element <li> does not allow character content (See http://www.w3.org/TR/xmlschema-1/#cvc-complex-type clause 2.3) Error on line 346 of file:/c:/MyJava/doc/saxon8/render-page2.xsl: FODC0005: ValidationException: The content model for element <li> does not allow character content
(2 messages, one giving the location in a source document, the other the location in the stylesheet that caused this source document to be read)
(ii) a validity problem with the output that can be detected at compile time:
Error on line 20 of file:/c:/demo2/queries/err-sa-xslt004.xsl: XTTE1510: Element h:tittle is not permitted in the content model of the complex type of element head Failed to compile stylesheet. 1 error detected.
Note how the error message points to the place in the stylesheet where the error occurs. The offending line is this:
<h:html xsl:validation="strict"> <h:head><h:tittle>A list of functions</h:tittle></h:head> <h:body>
(iii) a run-time validity problem with the output:
Validation error on line 38 of file:/c:/demo2/queries/err-sa-xslt004.xsl: XTTE1510: In content of element <body>: The content model does not allow element <div> to appear here. Expected one of: {http://www.w3.org/1999/xhtml}blockquote, {http://www.w3.org/1999/xhtml}dfn, {http://www.w3.org/1999/xhtml}br, {http://www.w3.org/1999/xhtml}h6, {http://www.w3.org/1999/xhtml}p, {http://www.w3.org/1999/xhtml}sup, {http://www.w3.org/1999/xhtml}hr, [other possibilities snipped] (See http://www.w3.org/TR/xmlschema-1/#cvc-complex-type clause 2.4) Transformation failed: Run-time errors were reported
The error message here points to a line in the stylesheet that does:
<xsl:copy-of select="*"/>
- the error arises because the <div> element being copied is in the wrong namespace.
(iv) Same as (ii), but with the -vw (validation warnings) option on the command line:
Same messages on the console, but this time the invalid output HTML is written to the requested destination, with embedded comments. The relevant section of the output file looks like this:
<h:h1>fn:collection() => node()*</h:h1> <!-- VALIDATION ERROR: In content of element <body>: The content model does not allow element <div> to appear here. Expected one of: {http://www.w3.org/1999/xhtml}blockquote, {http://www.w3.org/1999/xhtml}dfn, {http://www.w3.org/1999/xhtml}br, [list snipped] {http://www.w3.org/1999/xhtml}samp --> <div xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Anyway, thanks for the feedback. It's good to see schema-aware processing getting some discussion. There are real benefits, but as you point out there are also limitations and things to learn about what works well and what doesn't. There are also opportunities for products to go beyond the spec - checking for XHTML validity being an obvious example.
Michael Kay Saxonica Limited
-- Jesper Tverskov
www.xmlkurser.dk www.xmlplease.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Schema-aware validation o, Michael Kay | Thread | Re: [xsl] Schema-aware validation o, Andrew Welch |
[xsl] xsl:number from="/" level="an, Deborah Pickett | Date | Re: [xsl] xsl:number from="/" level, David Carlisle |
Month |