Re: differentiation between text() and entities???

Subject: Re: differentiation between text() and entities???
From: Mike Brown <mike@xxxxxxxx>
Date: Tue, 13 Jun 2000 19:47:22 -0600 (MDT)
> If the input contains "some text&copy;other text"
> 
> then I ONLY get "some textother text" as the output?

If &copy; is in your source XML/XHTML, is there a declaration of the
'copy' entity as being "&#169;" in the DTD? If not, it shouldn't even make
it past the parser. &copy; is not by default an entity reference you can
use in XML/XHTML.

If you're not actually parsing a document and you are saying that you
have, for example, created a "some text&copy;other text" CDATA node in a
DOM, then I would expect that this would be equivalent to putting "some
text&amp;copy;other text" in an XML document that was parsed. You wouldn't
need to declare &copy; in this case because the only entity reference is
to &amp;, which is known in XML.

> Any workaround to get "some text&copy;other text" as the output???

There shouldn't be a need for a workaround. If you have a general parsed
entity reference in the input like that, it will be expanded by the parser
and replaced with some character data (character number 169, the copyright
symbol, if the entity has been declared properly).

When the parser reports to the XSLT processor what is in the document, it
will say there's a string of character data "sometext^othertext", where ^
is the copyright symbol, which I can't type in this particular editor. The
XSLT processor will put this character data into a text node in the source
tree. There is no such thing as an entity reference in the XSLT/XPath data
model. Nor should there be, since XSLT acts on node trees typically
derived from parsed XML documents, and by the rules of XML 1.0, a parser
is supposed to expand, not report the presence of, references to general
parsed entities.

When you use xsl:value-of to create a new text node in the result tree,
telling it to use the contents of that text node from the source tree, the
new node will be a copy of the original. When the XSLT processor
serializes and outputs the result tree as a byte stream, if you have
<xsl:output method="html"/> in your stylesheet, it will most likely output
the character data as the ASCII bytes for "sometext&copy;othertext". If
the output method is xml, "sometext&#169;othertext" is the likely output.

Under no circumstances should the character disappear altogether.

> Interestingly and surprisingly, if I match on html tag and write html tag to
> the output, I get the desired result but I really don't want to write the
> output enclosed between <html> and </html>. I am unable to get a good
> explanation of this behavior.

Are you looking at the output as it is rendered in a web browser or are
you looking at the actual output that is being produced by Xalan or
whatever server-side XSLT processor you are using?

I suspect that your problem has nothing to do with the code samples you
provided and is instead more to do with the XML parser you are using, the
XSLT namespace you are using, (let's see the rest of the stylesheet), the
XSLT processor (did you say it was Xalan?), or what you're viewing the
output with.

   - Mike
____________________________________________________________________
Mike J. Brown, software engineer at         My XML/XSL resources:
webb.net in Denver, Colorado, USA           http://www.skew.org/xml/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread