RE: [xsl] CDATA Handling

Subject: RE: [xsl] CDATA Handling
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 07 Jan 2009 11:48:24 -0500
Hi again,

At 07:55 PM 1/6/2009, Ken wrote:
In my opinion, the comment and processing instruction annotations should be able to be completely removed from an XML document without changing its information. Yes, the processing of the information might change, but that nothing would be lost.

I recognize the suggested use of start and end delimited the information, so the argument could be made that even that route to this issue would lose some information if the annotations were removed.

But at least no data would be lost and could be recovered if all annotations were stripped.

When I teach XML I distinguish the two annotations as typically "comments for humans" and "pis for programs", and like the perspective of a DTD that cannot constrain annotations, there should be no inherent information *in* annotations. The data integrity should not be lost if all annotations are removed.

After all, the built-in template rules in XSLT do nothing with annotations. That says a lot to me there.

But that's just my perspective.

I'm on the fence on this one. I like the neatness of Scott's suggestion, but I also generally agree with Ken (while noting that he seems to be using the term "annotation" in a formal technical sense in this context, which I'm not sure it properly has :-).

That being said (and also in response to Evan), I think it worth recognizing that what we're doing here is debating a design problem that is constrained in the following ways:

* We assume the image data has to be represented in Base64 in line, not out of line by reference
* We assume that no element is available for representing this data properly and discretely within the document hierarchy
* We assume that the schema cannot be modified in order to provide us such an element

All these are given for this problem (as per the OP), but are not the case in all such problems. Indeed, since any of them would provide us with a better answer architecturally, we are reduced, in effect, to discussing what the best possible kluge would be, given that more correct solutions aren't possible.

And it's in the nature of kluges not to be ideal, in fact to present exactly the sort of awkward tradeoff as we're seeing in the suggestions here and the responses they're eliciting. That's what makes them kluges. You pays your money and you takes your chances. :-)

One things we've agreed on, however, is that CDATA marked sections aren't a solution, as they are neither reliable for doing the job called for, nor are they appropriate given their semantics.


Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.      
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

Current Thread