[xsl] best way to store ad-hoc HTML data

Subject: [xsl] best way to store ad-hoc HTML data
From: Terence Kearns <terencek@xxxxxxxxxxxxxxxxxxx>
Date: Thu, 23 Oct 2003 11:42:34 +1000
As far as I can see, there are two ways (not including any binary/base64 based methods).
1) convert-to/ensure well-formedness and confine to a dedicated element
2) shove it into a CDATA section and pretend there is no markup there

Option 1:
has the disadvantage of implementing a pre-processor such as HTML tidy to try and ensure well-formedness. Even then, there is no guarentee that the process won't work and the input will be rejected.
On the other hand, there is the advantage of being able to access the content semantics provided by the HTML markup.
In the past of have done this and then done an identity transform on the subtree (HTML) of the dedicated HTML content element.
<xsl:template match="//myXHtmlContent//node() | @*" mode="copyXHtml">
Using this allows to me "process" some of the markup and apply rules such as deleting and <blink/> elements or whatever.

Option 2:
has the advantage of safely accepting *all* [suitably encoded] content provided the "]]>" character sequence is escaped. The disadvantage is that the content is now dead-end data. Also, when transforming it with XSLT, you have to remember to "disable-outpute-scaping" if you plan on sending that HTML content to a browser for rendering.

So there are disadvantages with both obptions. I'd like to know how other people have approached this problem and I'm keen on any advice. Particularly if people have used option1, what is the best way to ensure ad-hoc HTML becomes well-formed (assuming you have no control over the composition domain).

Terence Kearns ~ ph: +61 2 6201 5516
IT Database/Applications Developer
Enterprise Information Systems
Client Services Division
University of Canberra

XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list

Current Thread