RE: [xsl] double escaping problem [re-visited]

Subject: RE: [xsl] double escaping problem [re-visited]
From: pkeane <pkeane@xxxxxxxxxxxxxxx>
Date: Tue, 13 Nov 2007 08:28:45 -0600 (CST)
On Tue, 13 Nov 2007, Michael Kay wrote:

The document() function invokes an XML parser and it can only do what an XML
parser does.

In fact an XML parser removes one level of escaping, and a serializer adds
it back. So the parser turns "&amp;" into "&" and "&amp;amp"; into "&amp;",
and the serializer turns them back into "&amp;" and "&amp;amp;"
respectively, unless d-o-e is set, in which case they are turned into "&"
and "&amp;" respectively. All the evidence is that your XML source as read
by the parser was actually double-escaped. This quite often happens when you
have fragments of XML stored in a database: if you try to extract it as XML,
and the database software doesn't realise that it's already in XML format,
then the database software adds a level of escaping that you don't want. The
way to get rid of it is to change the way you do the database query.

Michael Kay

Thanks to you and Abel Braaksma I have figured out the problem, which I'll describe briefly here. One mistake was that I assumed the "raw" xml as viewed through Firefox was the actual text of the XML, but indeed, a double escaped ampersand was being masked.

The cause of the double escaping in the source document was due to the fact that I had switched from a string concatenation method of constructing the xml out of the database to using PHP's SimpleXML functions. In the string concat code I was using the htmlspecialchars() function to properly escape text coming from the database. I failed to realize that this was unnecessary with the SimpleXML functions, since those functions do all the escaping that needs doing in order to create valid xml. So eliminating those superfluous htmlspecialchars() calls fixed the problem.

many thanks!
Peter Keane

Current Thread