--On Thursday, February 26, 2009 12:34:36 +0000 Michael Kay wrote:
I think the output that shows the namespace URI percent-encoded is wrong.
I can't see any justification for percent-encoding a namespace URI under
any circumstances.
According to "Namespaces in XML 1.0 (Second Edition)", "An XML namespace is
identified by a URI reference [RFC3986]". '{' and '}' are not valid
characters in a URI according to RFC3986.
Switching to XML 1.1 does not make '{' or '}' acceptable either.
"Namespaces in XML 1.1 (Second Edition)" says "An XML namespace is
identified by an IRI reference [RFC3987]". RFC3987 section 3.1 "Mapping of
IRIs to URIs" says:
Systems accepting IRIs MAY also deal with the printable characters in
US-ASCII that are not allowed in URIs, namely "<", ">", '"', space,
"{", "}", "|", "\", "^", and "`", in step 2 above.
So '{' and '}' are not valid characters in an IRI either.
In XSLT 2.0 namespace names can be created by both the xsl:namespace
instruction and the namespace attribute of the xsl:element instruction. In
both cases the string value is required to be in the lexical space of the
xs:anyURI data type.
The lexical space of the xs:anyURI data type allows characters that are
permitted neither in URIs nor in IRIs. The string "{$x}" is in the lexical
space of xs:anyURI but is not a valid IRI. Neither XML namespace
specification mentions the xs:anyURI type or anything equivalent to it so
we have a way to construct a namespace node in which the namespace name is
not valid in XML.
When serializing to XML, what is a serializer to do with namespace names
that are not valid according to the relevant "Namespaces in XML" definition?
The options seem to me to be:
1) emit the string in its invalid form. Output is not valid XML - I don't
like this one.
2) report an error. Rather harsh but it does point out the error.
3) apply the mapping rule with the permitted extension, report an error if
the result is not a URI. Output is valid, at least in the case in question,
but it hides the error.
When generating XML 1.1 there is a variant of case 3 - apply the mapping
rule extension to the printable characters that are not valid in an IRI so
that the result is a valid IRI.
In the example in question, it seems that one serializer has opted to emit
invalid XML and the other to apply the mapping rule to percent encode the
characters that are not valid in a URI without giving a warning. It also
seems that in both cases the XML parser has accepted the invalid XML
without any error or warning.
--
Owen Rees; speaking personally, and not on behalf of HP.
========================================================
Hewlett-Packard Limited. Registered No: 690597 England
Registered Office: Cain Road, Bracknell, Berks RG12 1HN