Subject: Re: GOTCHA! From: "Oren Ben-Kiki" <oren@xxxxxxxxxxxxx> Date: Fri, 15 Jan 1999 12:31:52 +0200 |
James Clark <jjc@xxxxxxxxxx> wrote: >I wrote: >> Now this is a hack! You stepped on another XT bug here - or a specs bug. I >> checked the following: >> >> <xsl:template match="A"> >> <xsl:pi name="JavaScript"> >> <xsl:text><![CDATA[<&>]]></xsl:text></xsl:pi> >> </xsl:template> >> >> And got in the result: >> >> <?JavaScript <&>> > >Not an XT bug or a specs bug. You would get > ><?JavaScript <&>?> > >which *is* well-formed. Remember that in XML a PI is terminated by ?>. Checked again. XT version 0.5 emits '>' and not '?>' at an end of a PI. Also, it would emit '?>' inside the content without converting it to '? >' as per the spec. But you are right - there are three constructs which avoid markup, comments, PIs and CDATA, and we already have access to two of them. I'd still argue that being able to generate all possible XML text files (as opposed to all possible XML in-memory representations) has its value, but I understand why that would be lower priority. >How often will you get ?> in Javascript? Less often than ]]> I suspect. I believe that '>?' isn't valid JavaScript. It might appear in strings, of course... But strings in embedded scripts are a whole painful issue by itself :-) >> Or could I expect that an XML/XSL processor to be smart enough to use >> different character quoting rules within a <SCRIPT> tag? > >Right. I've tried to understand how this works - it does work, to my great surprise. I went back to the documentation... The XML spec insists that unadorned '<' and '&' can appear only inside CDATA sections, a PI, or a comment (section 2.4). Section 2.7 describes CDATA sections and makes it clear they always begin with "<![CDATA[" and end with "]]>". Section 3.2 discusses element types. It lists '#PCDATA' as a possible type in 3.2.2 (without giving its definition, or even a link to somewhere where it is defined - strange). It does _not_ list 'CDATA' as a valid type. XSL is expected to always emit valid XML. And yet... The HTML 4.0 does specify CDATA as the value type for the SCRIPT element (and many other things), with a link to the _SGML_ standard. Obviously HTML 4.0 isn't XML. Yet it is a valid result-ns for XSL, and the XT processor emits what seems to be CDATA, for SCRIPT tags. Should be illegal... The explanation is in section 2.2. In an editorial note it states that it is possible to use the result-ns to specify non-XML output, and lists HTML as an example. Elthough this is just an editorial note, _it explicitly caters to non-XML output_. Who said the W3C isn't responsive? They are just being shy about it, so they put in in small letters :-) In fact, it is a very elegant way of solving the problem - it limits the damage to a single attribute of a single tag. Neat! Even better, this trick has the potential to settle this issue once and for all. Consider adding an 'http://www.w3c.org/TR/rec-cdata' result-ns. This result-ns would specify that all output elements have the content type 'CDATA', so that any text emitted by the stylesheet would not be marked up, ever. This can't be done in an XML DTD, but neither can the HTML one. Stylesheets using this result-ns would probably not bother to generate elements, anyway; by using just <xsl:text> etc. they'll generate output in an arbitrary formats - without changing anything in the XSL standard itself. >> It would also have >> to examine the LANGUAGE attribute for it... > >Huh? SCRIPT in HTML 4.0 is an SGML CDATA element, which means that when >outputting it, & and < must not be escaped to & and <. This is >independent of the scripting language. Right. Sorry. I was thinking about the quoted strings problem - the need to take some text and quote it so that it may be safely embedded in a scripting language string; this would be different between scripting languages. It's really a variant of the arbitrary text formatting issue. If <xsl:ecmascript-string> is unacceptable, how about adding a perl-like regexp capability to <xsl:text>? <xsl:text transform='s:["\\]:\\&:g'> would do wonders :-) BTW, a final hack which works in XT, if the result-ns is HTML, and would probably work in other processors as well: <xsl:template match="..."> <SCRIPT> <xsl:text><![CDATA[</SCRIPT>]]> Anything you want - <, &, > <![CDATA[<SCRIPT>]]></xsl:text> </SCRIPT> </xsl:template> Emits: <SCRIPT></SCRIPT> Anything you want: <, &, > <SCRIPT></SCRIPT> Where there's a will, there's a way :-) Oren Ben-Kiki XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: GOTCHA!, James Clark | Thread | Re: GOTCHA!, James Clark |
Re: cdata was: XSL and HTML, Guy_Murphy | Date | Re: last of type problem, Jarle Stabell |
Month |