Re: [xsl] double escaping problem [re-visited]

Subject: Re: [xsl] double escaping problem [re-visited]
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Tue, 13 Nov 2007 07:34:42 +0100
pkeane wrote:

Hmmm. I was afraid of that. I am still baffled as to how to go about telling my stylesheet that the input it gets from a particular source tree by way of the document() function that it will have already been escaped and therefore that '&amp;' need not be escaped again (making it '&amp;amp;').


Here's the xml coming from http://example.com/collections.xml:
--------------------------
[...]
<collections>
<collection name="Art &amp; Art History Collection" id="1"/>
<collection name="Photography Collection" id="2"/>
</collections>

The stylesheet does not escape the &amp;. It really is your source. It says: "the literal value of attribute @name is 'Art & Art History Collection' ". The &-character is reserved in XML and _must_ be escaped when inside an attribute (same for &lt;, any other character can be used literally, albeit that in certain circumstances, quotes must be escapted too).


If your source is as it is above (meaning: when that is the literal textual view, and not the view that a browser or other interpreter gives you), then the XML processor (_not_ the XSLT processor, this happens before XSLT "sees" it) will interpret the &amp; as a literal &-character. When you process it and output it (i.e., with a copy template or something), the output will be written with the literal &amp;, and not as an &amp;amp; (unless, again, the source you show above is not the literal view, but from inside a browser), which will then be interpreted by any XML processor as the literal &-character.

From reading this thread (partially, I'm afraid) it sounds to me that your source contains a double escaped ampersand. The XSLT processor will otherwise never create a double escaped ampersand all by itself. Using D-O-E in this case, suppresses the normal escaping mechanism, resulting half-escaping (i.e., an &amp;amp; in the source becomes &amp; in the output, an &amp; in the source becomes a single &-character in the output, possibly resulting in not well-formed XML output).

Are you certain that your source is really as you mention above, and that the processor is adding escaping? From years with XML I've never seen that happening (only visually: as part of wrong interpretation of what we see). Make sure you view your source with an XML editor, or a text editor, *never* the browser.

Cheers,
-- Abel Braaksma

Current Thread