Non-XML textual output formats

Subject: Non-XML textual output formats
From: James Clark <jjc@xxxxxxxxxx>
Date: Sun, 17 Jan 1999 10:34:34 +0700
Oren Ben-Kiki wrote:

> Consider adding an 'http://www.w3c.org/TR/rec-cdata' result-ns. This
> result-ns would specify that all output elements have the content type
> 'CDATA', so that any text emitted by the stylesheet would not be marked up,
> ever. This can't be done in an XML DTD, but neither can the HTML one.
> Stylesheets using this result-ns would probably not bother to generate
> elements, anyway; by using just <xsl:text> etc. they'll generate output in
> an arbitrary formats - without changing anything in the XSL standard itself.

I agree that being able to use XSL to generate non-XML textual formats
would be useful, and result-ns provides a means to do so.

However, I don't think it's quite as simple as just saying that all
elements are CDATA, in other words that all character data gets output
directly.  Many output formats follow the general pattern of having data
and control information; certain special characters introduce control
information and these special characters have to be escaped when used as
data.  What's needed is a result namespace that can deal with this
escaping issue.

Here's what the DTD for such a result namespace might look like:

<!ELEMENT nxml (escape*, (control|data)*)>
<!ATTLIST nxml encoding NMTOKEN "UTF-8">
<!ELEMENT escape (#PCDATA|char)*>
<!ATTLIST escape char CDATA #REQUIRED>
<!ELEMENT control (#PCDATA|char|data|control)*>
<!ELEMENT data (#PCDATA|data|control)*>
<!ELEMENT char EMPTY>
<!ATTLIST char number NMTOKEN #REQUIRED>

The nxml element is the root element; the encoding attribute is a MIME
charset to be using for encoding characters as bytes.

The data element contains data.  Within a data element control
characters get escaped.  The escape element specifies how a particular
control character gets escaped.

The control element contains control information.  Within a control
element, all characters are output directly without escaping.

The char element allows the output of a character that is not allowed by
XML (such as control-L).

Is anything else needed?

James


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread