attribute value escaping (was Re: disable escaping in copy)

Subject: attribute value escaping (was Re: disable escaping in copy)
From: Mike Brown <mike@xxxxxxxx>
Date: Thu, 8 Jun 2000 16:05:28 -0600 (MDT)
> > > <img src="`request.getParameter(%22path%22)`/img.gif"></img>
> >
> > What XSL processor is giving you %22?!
> 
> I am using Xalan

Ah, right, as Mike Kay pointed out, %22 is correct for escaping double
quotes in URIs. Xalan must assume for the HTML output method that
the src attribute on an img element must contain a URI, which is not an
incorrect assumption.

FWIW, http://www.ietf.org/rfc/rfc2396.txt section 2.4.3 says that "`"
must be escaped as well. I'm not convinced that Xalan is making a good
decision here, because section 2.4.2 says 

   A URI is always in an "escaped" form, since escaping or unescaping a
   completed URI might change its semantics.  Normally, the only time
   escape encodings can safely be made is when the URI is being created
   from its component parts; each component may have its own set of
   characters that are reserved, so only the mechanism responsible for
   generating or interpreting that component can determine whether or
   not escaping a character will change its semantics. Likewise, a URI
   must be separated into its components before the escaped characters
   within those components can be safely decoded.

For Xalan to apply escaping blindly to a completed URI is a violation of
this statement and common sense. My feeling is that you should be really
be getting the kind of escaping I mentioned (&#22;), which just
disambiguates double quotes when used as character data instead of markup
delimiters.

I say this because according to the HTML 4.01 spec, an <img> src is never
supposed to be what the spec calls script data; it can only be a URI and
thus should not be subjected to the no-escaping-for-<script>-and-<style>
rule in http://www.w3.org/TR/xslt#section-HTML-Output-Method.

This actually brings up a hole in the XSLT spec; it doesn't say escaping
should be disabled for "script data" attributes like onClick.

Here is a proposal for discussion. Would this be reasonable behavior to
expect from an XSLT processor? --

  When the output method is 'html', and an attribute value contains
  double quotes (") but no single quotes ('), or single quotes but no
  double quotes, the quotes in the value should be unescaped and the
  other kind of quote used to delimit the attribute value.

I don't think it would break anything, because true URIs won't contain
unescaped quotes, and script data will benefit.

> > An HTML engine that knows what is doing should handle
> > <element attribute="stuff&#22;stuff&#22;stuff"> correctly.
> 
> The output is a JHTML and Dynamo is complayning about the %22. It expects the
> string parameter of the getParameter()  to be inclused in ("").

Ah, not an unreasonable expectation, then.

> When using: <xsl:value-of select="@*" disable-output-escaping="yes/>
> The attributed value is outputed as expected (unescaped) although  not as an
> img attribute;
> <img>"`request.getParameter("path")`/img.gif"</img>

Yeah, xsl:value-of creates text nodes.
You might check and see if this works:

<img>
  <xsl:attribute name="src">
    <xsl:value-of disable-output-escaping="yes" select="@*"/>
  </xsl:attribute>
</img>

...although it shouldn't.

> > Also you are getting </img> closing tags in the output. Did you put
> > <xsl:output method="html"/> in your stylesheet?
> 
> Yes. The input xml is a modified JHTML. The desired output is JHTML.

If Xalan emits closing tags for </img> when there's no text in between,
that's a bug. Sorry I can't be of more help.

   - Mike
____________________________________________________________________
Mike J. Brown, software engineer at         My XML/XSL resources:
webb.net in Denver, Colorado, USA           http://www.skew.org/xml/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread