Re: [xsl] Content of Script element getting wrapped by CDATA

Subject: Re: [xsl] Content of Script element getting wrapped by CDATA
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 23 Oct 2008 15:41:06 -0400
Hi,

At 10:56 AM 10/23/2008, Darcy wrote:
> It does seem complicated to hide the CDATA section... but it is
> necessary if you have invalid XML characters in your javascript code
> and do not wish to escape them individually.
>
> No, XSLT would do this automatically, doing it differently in xml or
> html output.

This took me a bit to get:

If output is set to html and remove the script template, then XSLT
(saxon at least) does not escape the output.

Yes: Saxon is following rules for HTML serialization that account for this.


If not all processors do, this must be in part because the rules are somewhat complex and hard to understand without understanding their underlying rationale. What makes it worse is that HTML is close enough to XML that it "seems" like it should "just work". If they did not use such similar markup syntaxes, it would be easier to accept that they make entirely different assumptions about whether, how, and when their text content may be processed by client applications raw (without unescaping characters that may have to be escaped due to rules in the markup language).

HTML and XML both use angle brackets for historical reasons -- both emerge out of SGML. By the mid-to-late 1990s, HTML had started leaning on SGML specs for dealing with this sort of problem. The notion of a "CDATA element" (an element whose content is known not to include any markup, and that therefore can contain unescaped markup characters as long as the end tag is not encountered) is an SGML concept. It was left on the floor when XML was designed as a simplification of SGML. Part of the reason for this is that it precludes processing arbitrary markup instances without foreknowledge of the particular tagging rules of their grammars (represented in SGML by DTDs).

I still need to escape
the input or enclose it in a CDATA so that the input is well formed
XML. But the output is unescaped as required.  So I now understand
what you meant by not using the CDATA in the output. (One limitation
may be that you can't process it as XML again because the output
content is not escaped.... unless you happen to have an easy way to
convert the HTML back to well formed XML.  Or have an XSLT processor
that can use an HTML parser.)

That's correct: if you serialize your result tree in a non-XML syntax, such as HTML, you can't expect to process it again as XML, at least not without converting back again first.


But if the output is set to XML, then it does seem necessary to use a
javascript comment on the CDATA because even though the output is set
to strict XHTML, the browser I tried (firefox) doesn't seem to
understand CDATA in the script element.

The XHTML Recommendation explicitly states that a CDATA marked section should be expected there (Section 4.8.), so an XHTML processor that knows it has XHTML ought to do the right thing. (Famous last words.)


  Hence the complicated
workaround of putting a javascript comment in front of the CDATA.  I
wished browsers would just understand CDATA when the doctype is xhtml
(ie XML).

Indeed.


So I like the idea of just not using the CDATA and using html output
(even if doctype is set to XHTML). But it still seems to have some
weaknesses.

It does -- several.


This is the kind of problem that comes up on the boundaries between systems. Here we have at least XML, HTML and Javascript to worry about. Given the necessity of mastering so much in order to get things to work right, one can safely predict that sometimes software developers will slip up.

On the other hand -- if you use a correct serializer such as Saxon's, you can serve up HTML that browsers can deal with. Or, if you prefer to serve up XHTML, you can do that, and conformant XHTML clients will know what to do. Take your pick. :->

Cheers,
Wendell



======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread