[xsl] CDATA or escape in the result tree problems

Subject: [xsl] CDATA or escape in the result tree problems
From: Kjetil Kjernsmo <kjetil@xxxxxxxxxxxx>
Date: Mon, 30 Oct 2006 09:35:31 +0100
Hi all!

I'm trying to integrate TinyMCE, a JavaScript WYSIWYG editor, into my
system. TinyMCE produces XHTML, and can be attached to textareas.
Textareas in HTML cannot contain other HTML elements, thus the problem
arises that the HTML needs to be escaped or put into a CDATA section.

At first, I thought this was going to be a straightforward application
of cdata-section-elements="textarea", and that is also what is
indicated in the FAQ. It does not work as expected, however.

I use Perl's XML::LibXSLT, which uses GNOME's libxslt.
My test system is Ubuntu Dapper, with the versions 1.58-1 and 1.1.15
respectively. My production system is Debian Sarge, with somewhat older
libraries. I haven't tested there yet.

It may influence the situation that I have a stylesheet with the
following output element:

  <xsl:output version="1.0" encoding="utf-8" indent="yes"
    method="html" media-type="text/html"
    doctype-public="-//W3C//DTD HTML 4.01//EN"
    doctype-system="http://www.w3.org/TR/html4/strict.dtd";
    cdata-section-elements="textarea"
    />

which imports a stylesheet match-control.xsl that contains

   <textarea name="{@name}" id="{@name}"
			rows="{@rows}" cols="{@cols}">
		<xsl:copy-of select="./ct:value/*/*"/>
   </textarea>

This outputs HTML, not wrapped in a CDATA, nor escaped. If I run the
resulting code through the W3C HTML validator, it complains that it is
invalid. I'd like the resulting nodes of <xsl:copy-of
select="./ct:value/*/*"/> put into a CDATA section, or perhaps escaped.

Any ideas why this is so? Is it because I import this stylesheet? Is it
a weakness with libxslt? Or have I misunderstood
cdata-section-elements?

Now, I assume that a CDATA section is The Right Way To Do It, but I
don't know if TinyMCE thinks likewise. I have seen it simply escape <
and > to &lt; and &gt;, and still submit it back as proper HTML.
So, I figured, maybe I should be more pragmatic about it (I'm normally
such a purist), and just escape them too...

The problem is that I allow users to insert pretty much any HTML in
there. My application does some validation and a bit of cleanup, so it
should be valid, but that makes is slightly harder to write the
template that it should match on if I were to just escape the HTML. If
the cdata thing above seems hard to do, I would be happy for advices on
how to do this as well.

Cheers,

Kjetil
--
Kjetil Kjernsmo
Programmer / Astrophysicist / Ski-orienteer / Orienteer / Mountaineer
kjetil@xxxxxxxxxxxx
Homepage: http://www.kjetil.kjernsmo.net/     OpenPGP KeyID: 6A6A0BBC

Current Thread