RE: disable-output-escaping

Subject: RE: disable-output-escaping
From: Mike Brown <mbrown@xxxxxxxxxxxxx>
Date: Mon, 24 Jan 2000 14:01:42 -0700
patrick honner wrote:
> The problem is that I want to have, as an XML-element, a 
> large HTML file that will be properly rendered upon the
> <xsl:value-of> call.

I would recommend seeking to ensure that the ESSAY element contents are
well-formed XML, eliminating the need for the unescaped CDATA hacks and
thereby allowing you to make use of more powerful XSLT instructions like
xsl:copy-of instead of xsl:value-of.

Barring that, you are stuck, because an XSLT processor is not required to
support disable-output-escaping (or output at all, for that matter). Maybe
you could create server-side include directives instead of copying the
possibly ill-formed HTML document into your otherwise tidy XML?

The following explanation may be overkill.

If you are thinking in terms of tags, and treating markup as CDATA, well,
that's usually bad, although in your case it might be necessary to do so if
(and only if) you are not able to ensure that the contents of the ESSAY
element are well-formed XML.

Your document looks like this:

     <ESSAY><![CDATA[ <B>Here</B> is some HTML that I want to output
                      about <I>Apples</I>


And the corresponding XPath/XSLT model's node structure, in my best ASCII
art, with \n for newlines, looks like this:

 |___element 'FRUIT'
       |___text '\n    '
       |___element 'NAME'
       |     |___text 'Apple'
       |___text '\n    '
       |___element 'ESSAY'
       |     |___text ' <B>Here</B> is some HTML that I want
       |               to output\n                          
       |               about <I>Apples</I>\n               \n\n'
       |___text '\n'

If you didn't have those CDATA delimiters in the XML, the structure would

 |___element 'FRUIT'
       |___text '\n    '
       |___element 'NAME'
       |     |___text 'Apple'
       |___text '\n    '
       |___element 'ESSAY'
       |     |___text ' '
       |     |___element 'B'
       |     |     |___text 'Here'
       |     |___text ' is some HTML that I want to output\n
       |     |                    about '
       |     |___element 'I'
       |     |     |___text 'Apples'
       |     |___text '\n               \n\n'
       |___text '\n'

> The HTML tags are just ignored when they stand alone inside 
> an element (at least when using XT).  They don't appear at
> all in the output, not even as literals.

As well they shouldn't. xsl:value-of creates in the result tree a text node
containing the string value of node(s) identified in the select attribute.
The string value of an element is the (often rather useless) concatenation
of all *descendant* text nodes in that element. String values are a concept
explained in the XPath spec. So the HTML tags aren't being ignored; they
don't even exist in the model, really. The element that each pair of tags
represents is ignored, and only the text node descendants are copied.

When the result tree is serialized as XML or HTML, markup is created based
on the types of nodes in the tree and their names and/or contents, depending
on the type of node. A *single* element node in the result tree, if that
node has descendant nodes of any type, manifests as a *pair* of tags (one
opening, one closing), and in between those tags is the serialized version
of the descendant nodes. When a text node is serialized as XML or HTML, in
order to prevent its CDATA from being confused with markup, the text is
escaped per the appropriate convention.

As David Carlisle said, XSLT is not well-suited to transforming unstructured
data (such as the markup you want/need to treat as pure character data) into
structured data -- i.e., a node tree that reflects the structure implied by
that markup and that can be serialized in a consistent,
well-formed-document-producing manner.

 XSL-List info and archive:

Current Thread