Re: & in SGML vs XML

Subject: Re: & in SGML vs XML
From: "Christopher R. Maden" <crism@xxxxxxxxxx>
Date: Sun, 05 Nov 2000 23:47:35 -0800
At 12:48 4-11-2000 +0100, Matthias Häußer wrote:
I have another tricky &-related question:

It's not XSL-related, and is better suited for XML-L (mail listserv@xxxxxxxxxxxxxxxxxx, no subject, body "subscribe xml-l") or comp.text.xml.


I have SGML documents which can easily be converted to XML by just
exchanging the declaration in the first line(s),
except for that they contain &'s standing alone, as in
<line>you & me</line>.

This is legal in SGML, but XML parsers and XT do not accept it.
Is there a way of getting this right except for string replacement
(& -> &amp;)? (Which is tricky because "real" entities like &Ccaron;
must not be destroyed.)
James Clark's sx does it alright, but I'd prefer a Java solution
(ideally, one line of declaration either in the stylesheets or the XML).

In other words: Is there a way of treating an XML document like
<line>you & me</line>?

An ampersand is recognized as a "delimiter in context", meaning that it must be followed by a name start character (see product [59] of ISO 8879). Assuming your SGML used the reference concrete syntax, you could do something like


s/&\([^a-zA-Z]\)/\&amp;\1/g # ampersand followed by innocuous character
                            # is replaced by &amp; and character
s/&$/\&amp;/                # ampersand at end of line is replaced by
                            # &amp;

See <URL:http://www.oreilly.com/%7Ecrism/sgmldefs.html> for the SGML formal productions, but they aren't very useful without the text of the Standard.

-Chris
--
Christopher R. Maden, Senior XML Analyst, Lexica LLC
222 Kearny St., Ste. 202, San Francisco, CA 94108-4510
+1.415.901.3631 tel./+1.415.477.3619 fax
<URL:http://www.lexica.net/> <URL:http://www.oreilly.com/%7Ecrism/>


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread