Re: CDATA Help (in SAXON)

Subject: Re: CDATA Help (in SAXON)
From: Mike Brown <mike@xxxxxxxx>
Date: Tue, 24 Oct 2000 18:40:20 -0600 (MDT)
Dylan Parker wrote:
> Can someone please explain to me why the following :
> 
>     <![CDATA[<BR/>]]>
> 
> ... gets converted to the following when output is set to html :
> 
>     &lt;BR/>

CDATA sections in an XML document serve no other purpose than to
unambiguously say "this is all text, not markup". In practice, all it does
is it keeps you from having to escape the beginning-of-markup characters
'<' and '&' and on occasion the end-of-markup '>'.

You are under the mistaken impression that in a CDATA section '<' and '&'
mean something different than '&lt;' and '&amp;' would mean outside of a
CDATA section, but they do not; an XML parser will treat them the same.

To further clarify, this XML:

	<p>hello<BR/>world</p>

implies an XPath/XSLT node tree like this:

	element 'p'
	  |___text 'hello'
	  |___element 'BR'
	  |___text 'world'

While this XML:

	<p>hello&lt;BR/&gt;world</p>

or this XML:

	<p><![CDATA[hello<BR/>world]]></p>

are logically equivalent, implying a node tree like this:

	element 'p'
	  |___text 'hello<BR/>world'

If you have that in your result tree and you emit it as XML, the
serializer will make start and end tags as appropriate for the nodes,
embedding and quoting attribute nodes in the tags as appropriate. The last
fragment above would very likely be emitted as
<p>hello&lt;BR/&gt;world</p>, although it wouldn't be wrong to emit it as
<p><![CDATA[hello<BR/>world]]></p>. You certainly wouldn't want to get
output of <p>hello<BR/>world</p> because we've already established that
this means something completely different.

HTML output is very similar, the only real difference in this case being
that empty elements will be represented by what looks like a start tag
only, like <BR> instead of <BR/> ... but that's assuming you've got a BR
*element* in your result tree.

So you are hereby challenged to get a BR element into the result tree, so
it will be serialized as an actual BR tag by the HTML outputter. Three
ways to do it:

1. <BR/> literal result element in the stylesheet
2. <xsl:element name="BR"/> instruction in the stylesheet
3. <xsl:copy-of select="/path/to/an/empty/BR/element/in/source/tree"/>
   instruction in the stylesheet

All well and good, but you said you wanted <BR/>, which is *not HTML*. Why
then are you relying on the HTML output method? Tsk. Only one way then,
the wrong way, despised because
 - people who rely on it tend to produce malformed documents
 - people who rely on it think in terms of tags-in-tags-out, when
   they should be thinking about the information set implied by an XML
   document, the trees based on that information, and the automatic
   derivation of output from the new trees constructed based on the
   stylesheet contents...

<xsl:value-of select="'&lt;BR/&gt;'" disable-output-escaping="yes"/>

Hey, you asked for verbosity.

   - Mike
____________________________________________________________________
Mike J. Brown, software engineer at         My XML/XSL resources:
webb.net in Denver, Colorado, USA           http://www.skew.org/xml/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread