Re: [xsl] Preserving XHTML markup

Subject: Re: [xsl] Preserving XHTML markup
From: Mike Brown <mike@xxxxxxxx>
Date: Tue, 5 Feb 2002 01:44:56 -0700 (MST)
Eric Vitiello wrote:
> We never confuse the occupants of a building with it's structure, and in the
> same vein, we should keep a distinct line between the structure and content 
> of an XML document.
>
> Sometimes, the content itself is XML, and to distinguish between this xml
> and the main structure, CDATA blocks are used.

You're one layer of abstraction shy.

The 'content' of an XML document is arguably only runs character data that are
dispersed throughout the document as the content of 'logical' structures:
nestable, named elements and their name-value attribute pairs. 

The 'physical'/lexical markup achieves this imposition of structure on the
character data, and provides a few other conveniences for dealing with
composite documents and their binary representations. One such markup
construct is the CDATA section, which serves no other purpose than to
designate a run of character data as such, when it would otherwise possibly be
confused with markup because it contains "<" or "&". Your description of CDATA
sections is not inaccurate in this regard, but it illustrates only one of many
possible use cases.

One of the points you seem to be missing in this thread is that XPath and XSLT
operate using a data model that is at a level just slightly higher than the
character-data-divided-into-elements-and-attributes as imposed by the markup
and reported by the XML parser. This DOM-like model uses node trees to
represent the source document, the transformation result prior to its
automatic serialization by the XSLT processor, and even the stylesheet itself.
If you fully appreciate this model and the processing model that XSLT uses,
then 99% of the time, any arguments for using disable-output-escaping are
exercises in kludgery (pardon my invention of new words here) that are aimed
at working around (mis)perceived shortcomings in the result tree serialization
process rather than understanding how to manage the content and the 
transformation correctly, from an XSLT processing standpoint, in the first 
place.

Another point is that CDATA sections are only meaningful to the XML parser;  
they only disambiguate character data that isn't interrupted by markup from
character data that is. "<foo><![CDATA[1 & 2 are < 3]]></foo>" is exactly the
same as "<foo>&lt;1 &amp; 2 are &lt; 3</foo>" so there is no advantage to 
using a CDATA section unless your XML document creation involves the 
questionable practice of serially pasting together unstructured strings.

   - Mike
____________________________________________________________________________
  mike j. brown, fourthought.com  |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  personal: http://hyperreal.org/~mike/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread