Subject: RE: [xsl] xs:anyURI allows a space From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Mon, 16 Jul 2007 15:49:05 +0100 |
The rules in XML Schema for xs:anyURI are not exactly a model of clarity, and I have whinged about them loudly and often. The sentence "The mapping from anyURI values to URIs is as defined by the URI reference escaping procedure..." means (I am told) that to get from an xs:anyURI value to a real URI, you percent-encode the characters that are not allowed in a URI. This includes space. Later on the spec says "Note: Spaces are, in principle, allowed in the .lexical space. of anyURI, however, their use is highly discouraged (unless they are encoded by %20)." Which has caused no end of confusion, because in fact spaces are also allowed in the value space (which is the same as the lexical space). And of course, discouraging something in a Note doesn't make it invalid. The whitespace facet for xs:anyURI, which defines what happens to whitespace as written in the source XML before creating the "lexical" value, is "collapse" (see the schema-for-schemas), which is essentially the same as XPath normalize-space(), so it can leave single spaces in the middle of the value. There's a case to be made for collapsing whitespace before calling the URIResolver, but I don't think it's a strong one on either theoretical or practical grounds. The JAXP spec for the URIResolver interface is not exactly crystal-clear either. However, it doesn't mandate that the XSLT processor should do anything to the values supplied in the stylesheet source before calling the URIResolver, and it doesn't mandate that the values passed across should be valid URIs (or relative references) as per the RFC. My interpretation of the spec (and yours has just as much claim to validity...) is that it's the responsibility of the URIResolver to do any percent-encoding that is needed, and if you look at Saxon's StandardURIResolver you will see that it does this. The advantage of giving the URIResolver the raw string as supplied to xsl:include or document() is that it gives the user maximum control over how to interpret it. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Andrew Welch [mailto:andrew.j.welch@xxxxxxxxx] > Sent: 16 July 2007 15:11 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] xs:anyURI allows a space > > What is it that xs:anyURI allows a space, but Java's URI > class does not (when using the single argument constructor) ? > > If makes it difficult when resolving relative paths in the > URIResolver as something that's a valid xs:anyURI in a call > to doc() causes a URISyntaxException in Java when it hits the > resolver - don't they both sing from the same rec? > > The solution it seems is to manually %HH escape the spaces > (and use the single arg constructor), or manually deconstruct > the href and then use the appropriate constructor, eg for the > relative href "the > file.xml": > > new URI(null, null, href, null) > > Neither seems like the right way. > > Related-but-possibly-Saxon-specific-question: Should the href > argument that's passed to the resolve(href, base) method in > the URIResolver be %HH encoded - the base argument is, but > the href argument comes through exactly as it's written in > the stylesheet ? > > > -- > http://andrewjwelch.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] xs:anyURI allows a space, Owen Rees | Thread | Re: [xsl] xs:anyURI allows a space, Andrew Welch |
Re: [xsl] xs:anyURI allows a space, Andrew Welch | Date | Re: [xsl] xs:anyURI allows a space, Andrew Welch |
Month |