Re: More entity confusion and my opinion on the right way

Subject: Re: More entity confusion and my opinion on the right way
From: Chris Maden <crism@xxxxxxxxxxx>
Date: Tue, 5 Jan 1999 23:25:51 -0500 (EST)
Summary: XSL does XML-to-XML.  It (XT at least) does it properly.
Andrew Bunner worries too much... or lives in the real world.

[Andrew Bunner]
>   Here's what I don't understand about the entity-writing problem...
> 
>   If I include &amp; in my source document or stylesheet, I get
> &amp; in my result document (this is good since my result document
> is HMTL).

Yes.  The &amp; is parsed, and resolved to &.  Upon generating the
result document, the character & is *represented* as &amp;, the
built-in XML entity.  The character has not been changed.

>   If I include &quot; in my source document or stylesheet, I get "
> in the result document. Hmm. OK.

Again, the &quot; (another built-in entity) is parsed and resolved to
".  It is mandatory to represent & as something other than itself, but
not so ", so the processor emits it literally.  Once again, the
character has not been changed, only its *representation*.  (Cf. Lewis
Carroll on "Haddock's Eyes".)

>   Of even greater confusion to me is if I include &copy; in my
> source document.  In this case, I get an error... &copy; references
> an undefined entity.

If you didn't define it, no.  It's not built-in to XML; only amp,
quot, apos, lt, and gt are.  You could put a definition for &copy; in
the prolog of either your XML document with your data or the XML
document with your XSL stylesheet (depending on where you wanted to
use the entity reference).

> BUT, if I include &#169; in my source document, I get &copy; in the
> result document.  'Hooray!' I think to myself, this must mean that
> if I include an entity reference to an ascii code, it will get
> translated to the right HTML entity.

With what processor do you receive &copy;?  It either knows something
about your result DTD (possibly HTML), or it's broken.

>   Wrong.
> 
>   &#32; (space) gets translated to the space character, not to &nbsp;

I would be appalled if you got &nbsp; out of that, since the
non-breaking space is 160 in both ISO 8859-1 and Unicode; if my spaces
became non-breaking, it would be a Bad Thing.

>   So, upon reflection, my post-processing solution isn't very
> good. What if I want &quot; to appear in the result document?

Then you've got dependencies outside the scope of XML.  I can't think
of any situation in any HTML or XML application where the difference
between " and &quot; is material.  Inside a double-quoted attribute
value literal, the entity reference is necessary, but you can always
use single-quotes to delimit the literal.

> That entity is already defined in XML. Should I insert special
> "entity tags" like <myhack:nbsp/> and then write out "unusual"
> strings (ala ##nbsp##) for which I can grep and then replace with
> ampersand-ed references?

If the difference between various representations of the same
character makes a difference, then you're not using XML.

>   Does anyone have any thoughts?

Yes.  It may be, as I just said, that you're not using XML.  But XSL
is still a very powerful tool, and XML a powerful expression language,
and it's entirely reasonable to want to use it for non-XML ultimate
goals.

But yes, to use it in ways more than was intended, you need to do more
work.  If the representation of a character matters, then it's not
really the same thing, and you shouldn't pretend it is.  Don't use
&quot; (which has a well-defined meaning in XML: "), use &myquot;
(which can be defined however you like: perhaps <myhack:quot/>).

In short (too late!) &quot; and " are the same.  If you need something
different, use something different.  It's not wrong to want to do more
than generic XML, but don't distort the core trying to do it.  What
you want is more, so act accordingly.

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread