Re: Handling Entities in MSXML

Subject: Re: Handling Entities in MSXML
From: Mike Brown <mike@xxxxxxxx>
Date: Wed, 26 Jul 2000 11:58:54 -0700 (PDT)
ciaran byrne wrote:
> <body>Some text &dllr; </body>
> I get...
> <body>Some text &amp;dllr</body>
> What I want is:
> <body>Some text &dllr;</body>

You mentioned you don't have a DTD.

Your question boils down to "I'm referring to entities that haven't been
declared. Why aren't they working?"

The only entity references you can have in an XML document of any kind,
including XSL documents, without a DTD that declares the entities, are the
ones that are built-in to XML:


These are needed so that you can differentiate markup from character data.

I suspect that MSXML is being lenient when it allows you to have a reference
to an undeclared entity in your source XML/XHTML, or perhaps you're just not
using the method properly. In either case, rather than complain about your
undeclared entity, it's pretending that the reference is really just
character data, as if you had said

<body><![CDATA[Some text &dllr; ]]></body>

You must realize that entities are numerous physical storage units which all
together comprise the singular logical document. It is this single logical
document that you are concerned with in XSL. You feed the XSL processor the
document entity (the primary entity for the document), and it hands it off to
an XML parser, which abstracts away all of the 'physical' aspects of it -- so
things like general parsed entity references go away, replaced by their
replacement text, and the bytes<->character encodings for each entity also go
away. The parser reports on the single logical hierarchy of elements,
attributes, and character data that it finds, and the XSL processor makes use
of that information to create an internal representation of the XPath/XSLT
node tree. This node tree has no concept of entities and entity references.

If you are thinking "I want &foo; in the output" then you have to either fake
it by creating a text node with the characters & f o o ; and the
disable-output-escaping attribute set to "yes", or your have to rely on your
XSL processor's output method to know that certain characters or node types
should be emitted entity references. This is the kind of thing that the HTML
output method does in the XSL processors that support it -- certain markup
characters, and most text node characters not in the ASCII range (0x0..0x7E)
are emitted as numeric character references like &#1234; or as SGML entity
references like &bull;.

Even though you are managing to somehow get references to undeclared entities
parsed as just the characters that make up the entity reference, at that
point it is just character data and "&" is no longer special, so it *has* to
be escaped on output, to preserve its status as character data rather than
markup. Your only workaround is if your XSL processor supports the
disable-output-escaping option that was introduced in the later working
drafts of XSLT 1.0, you can do something like:

<xsl:template match="body">
    <xsl:value-of select="." disable-output-escaping="yes"/>


 XSL-List info and archive:

Current Thread