Re: & in SGML vs XML

Subject: Re: & in SGML vs XML
From: "Christopher R. Maden" <crism@xxxxxxxxxx>
Date: Sun, 05 Nov 2000 23:47:35 -0800
At 12:48 4-11-2000 +0100, Matthias Häußer wrote:
I have another tricky &-related question:

It's not XSL-related, and is better suited for XML-L (mail listserv@xxxxxxxxxxxxxxxxxx, no subject, body "subscribe xml-l") or comp.text.xml.

I have SGML documents which can easily be converted to XML by just
exchanging the declaration in the first line(s),
except for that they contain &'s standing alone, as in
<line>you & me</line>.

This is legal in SGML, but XML parsers and XT do not accept it.
Is there a way of getting this right except for string replacement
(& -> &amp;)? (Which is tricky because "real" entities like &Ccaron;
must not be destroyed.)
James Clark's sx does it alright, but I'd prefer a Java solution
(ideally, one line of declaration either in the stylesheets or the XML).

In other words: Is there a way of treating an XML document like
<line>you & me</line>?

An ampersand is recognized as a "delimiter in context", meaning that it must be followed by a name start character (see product [59] of ISO 8879). Assuming your SGML used the reference concrete syntax, you could do something like

s/&\([^a-zA-Z]\)/\&amp;\1/g # ampersand followed by innocuous character
                            # is replaced by &amp; and character
s/&$/\&amp;/                # ampersand at end of line is replaced by
                            # &amp;

See <URL:> for the SGML formal productions, but they aren't very useful without the text of the Standard.

Christopher R. Maden, Senior XML Analyst, Lexica LLC
222 Kearny St., Ste. 202, San Francisco, CA 94108-4510
+1.415.901.3631 tel./+1.415.477.3619 fax
<URL:> <URL:>

XSL-List info and archive:

Current Thread