RE: [xsl] xmlns created literally

Subject: RE: [xsl] xmlns created literally
From: Owen Rees <owen.rees@xxxxxx>
Date: Thu, 26 Feb 2009 15:26:36 +0000
--On Thursday, February 26, 2009 12:34:36 +0000 Michael Kay wrote:

I think the output that shows the namespace URI percent-encoded is wrong.
I can't see any justification for percent-encoding a namespace URI under
any circumstances.

According to "Namespaces in XML 1.0 (Second Edition)", "An XML namespace is identified by a URI reference [RFC3986]". '{' and '}' are not valid characters in a URI according to RFC3986.


Switching to XML 1.1 does not make '{' or '}' acceptable either. "Namespaces in XML 1.1 (Second Edition)" says "An XML namespace is identified by an IRI reference [RFC3987]". RFC3987 section 3.1 "Mapping of IRIs to URIs" says:

  Systems accepting IRIs MAY also deal with the printable characters in
  US-ASCII that are not allowed in URIs, namely "<", ">", '"', space,
  "{", "}", "|", "\", "^", and "`", in step 2 above.

So '{' and '}' are not valid characters in an IRI either.

In XSLT 2.0 namespace names can be created by both the xsl:namespace instruction and the namespace attribute of the xsl:element instruction. In both cases the string value is required to be in the lexical space of the xs:anyURI data type.

The lexical space of the xs:anyURI data type allows characters that are permitted neither in URIs nor in IRIs. The string "{$x}" is in the lexical space of xs:anyURI but is not a valid IRI. Neither XML namespace specification mentions the xs:anyURI type or anything equivalent to it so we have a way to construct a namespace node in which the namespace name is not valid in XML.

When serializing to XML, what is a serializer to do with namespace names that are not valid according to the relevant "Namespaces in XML" definition?

The options seem to me to be:

1) emit the string in its invalid form. Output is not valid XML - I don't like this one.
2) report an error. Rather harsh but it does point out the error.
3) apply the mapping rule with the permitted extension, report an error if the result is not a URI. Output is valid, at least in the case in question, but it hides the error.


When generating XML 1.1 there is a variant of case 3 - apply the mapping rule extension to the printable characters that are not valid in an IRI so that the result is a valid IRI.

In the example in question, it seems that one serializer has opted to emit invalid XML and the other to apply the mapping rule to percent encode the characters that are not valid in a URI without giving a warning. It also seems that in both cases the XML parser has accepted the invalid XML without any error or warning.

--
Owen Rees; speaking personally, and not on behalf of HP.
========================================================
Hewlett-Packard Limited.   Registered No: 690597 England
Registered Office:  Cain Road, Bracknell, Berks RG12 1HN

Current Thread