Re: [xsl] mystery #3: rendering embedded HTML

Subject: Re: [xsl] mystery #3: rendering embedded HTML
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Sat, 13 Apr 2002 17:53:45 +0100
Hi Gary,

>>>>>> "J" == Jeni Tennison <jeni@xxxxxxxxxxxxxxxx> writes:
>
>     J> You can use disable-output-escaping in this situation. 
>
> Not quite.  doe works for inline literal markup chars:
>
>     J> <envelope> <![CDATA[ <p>My mal-formed HTML.<br> ]]> </envelope>
>
> My situation is the inverse of doe. What I have is
>
>      <envelope>&lt;p&gt;My mal-formed HTML escaped.&lt;br&gt;</envelope>

Like you said later, as far as an XSLT processor is concerned, these
two bits of XML are exactly the same. CDATA sections are simply a
shorthand so you don't have to escape each XML-significant character
individually. If you think that disable-output-escaping only works if
your HTML has been escaped with CDATA sections, you're mistaken.

>     J> If this HTML makes up the majority of your page, the other
>     J> option is to use the text output method rather than the XML
>     J> output method:
>
>     J> <xsl:output method="text" />
>
> Again, this is if the XML data contains invalid invalid chars; it
> doesn't, it contains _escaped_ chars which need to be resolved back
> into invalid chars. It needs an entity resolver.

You haven't double-escaped your entities, as far as I can tell, so you
shouldn't have to have an entity resolver. When you have an XML document
like:

  <envelope>&lt;p&gt;My mal-formed HTML escaped.&lt;br&gt;</envelope>

The XSLT processor views this as a tree like this:

  envelope
    +- text: "<p>My mal-formed HTML escaped.<br>"

The string value of the text node is a less-than sign, followed by a
'p', followed by a greater-than sign and so on. The string value
contains those literal characters, not the escapes.

When you use an xsl:value-of instruction, you create a text node in
the result tree with exactly this same value. The XSLT processor then
has to serialize that result tree in some way, either as XML or as
text.

If it serializes as XML, it needs to escape any less-than characters
that are part of a string value of the text node, because text within
XML documents can't have less-than characters in them. So you get:

  &lt;p>My mal-formed HTML escaped.&lt;br>

Plain text, on the other hand, doesn't contain any significant
characters (or at least none that the XSLT processor knows about). So
any character just gets output exactly as it is rather than being
escaped. So the output is:

  <p>My mal-formed HTML escaped.<br>

The same happens with disable-output-escaping, where you tell the
processor that you don't want it to do its usual escaping tricks with
a certain piece of text.

If this isn't working for you, please post a sample of your XML and
the relevant part of your XSLT stylesheet and we'll see what we can
do.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread