RE: [xsl] xs:anyURI allows a space

Subject: RE: [xsl] xs:anyURI allows a space
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 16 Jul 2007 15:49:05 +0100
The rules in XML Schema for xs:anyURI are not exactly a model of clarity,
and I have whinged about them loudly and often. 

The sentence "The mapping from anyURI values to URIs is as defined by the
URI reference escaping procedure..." means (I am told) that to get from an
xs:anyURI value to a real URI, you percent-encode the characters that are
not allowed in a URI. This includes space.

Later on the spec says "Note:  Spaces are, in principle, allowed in the
.lexical space. of anyURI, however, their use is highly discouraged (unless
they are encoded by %20)." Which has caused no end of confusion, because in
fact spaces are also allowed in the value space (which is the same as the
lexical space). And of course, discouraging something in a Note doesn't make
it invalid.

The whitespace facet for xs:anyURI, which defines what happens to whitespace
as written in the source XML before creating the "lexical" value, is
"collapse" (see the schema-for-schemas), which is essentially the same as
XPath normalize-space(), so it can leave single spaces in the middle of the
value. There's a case to be made for collapsing whitespace before calling
the URIResolver, but I don't think it's a strong one on either theoretical
or practical grounds.

The JAXP spec for the URIResolver interface is not exactly crystal-clear
either. However, it doesn't mandate that the XSLT processor should do
anything to the values supplied in the stylesheet source before calling the
URIResolver, and it doesn't mandate that the values passed across should be
valid URIs (or relative references) as per the RFC. My interpretation of the
spec (and yours has just as much claim to validity...) is that it's the
responsibility of the URIResolver to do any percent-encoding that is needed,
and if you look at Saxon's StandardURIResolver you will see that it does
this. The advantage of giving the URIResolver the raw string as supplied to
xsl:include or document() is that it gives the user maximum control over how
to interpret it.

Michael Kay
http://www.saxonica.com/

 

> -----Original Message-----
> From: Andrew Welch [mailto:andrew.j.welch@xxxxxxxxx] 
> Sent: 16 July 2007 15:11
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] xs:anyURI allows a space
> 
> What is it that xs:anyURI allows a space, but Java's URI 
> class does not (when using the single argument constructor) ?
> 
> If makes it difficult when resolving relative paths in the 
> URIResolver as something that's a valid xs:anyURI in a call 
> to doc() causes a URISyntaxException in Java when it hits the 
> resolver - don't they both sing from the same rec?
> 
> The solution it seems is to manually %HH escape the spaces 
> (and use the single arg constructor), or manually deconstruct 
> the href and then use the appropriate constructor, eg for the 
> relative href "the
> file.xml":
> 
> new URI(null, null, href, null)
> 
> Neither seems like the right way.
> 
> Related-but-possibly-Saxon-specific-question: Should the href 
> argument that's passed to the resolve(href, base) method in 
> the URIResolver be %HH encoded - the base argument is, but 
> the href argument comes through exactly as it's written in 
> the stylesheet ?
> 
> 
> --
> http://andrewjwelch.com

Current Thread