RE: [xsl] character entities

Subject: RE: [xsl] character entities
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 28 Apr 2005 22:42:14 +0100
> I get that the "&amp;" is the unicode for an ampersand

No, it's the representation of ampersand in source XML (and in HTML)

, but what do you mean by "serialize" ?

Converting from the tree representation of XML to the "source"
representation. The reverse operation to parsing. XSLT processing has three
phases: parse the source XML to a tree, transform the tree, serialize the
result tree to XML. 
> 
> So, if a character reference is in an XML source file it will 
> show up as a 
> reference in an XHTML output file (I got the impression from 
> other posts 
> that the XSLT would change the reference into the actual character)?

The tree will contain the actual character. The parser will turn the
character reference into the actual character. The serializer has a choice:
if the character is available in the chosen output encoding it can output it
"as is"; if not (or if it is a special character like ampersand) then it
must use a character reference or entity reference.


> So what is the accepted way to add character references to 
> the output? Would 
> I have to run some kind of find-and-replace script after the XSLT 
> transformation? What do other people do?

You shouldn't care whether characters in the output are represented as
themselves or as character references. No process that consumes the output
XML or HTML is going to make a distinction, so it doesn't matter. Leave it
to the serializer to decide. If you want to look at the output in a text
editor, choose an encoding that your text editor can handle, and the
serializer will do the rest.

Michael Kay
http://www.saxonica.com/

Current Thread