Re: [xsl] Preseving character entities

Subject: Re: [xsl] Preseving character entities
From: Geert Josten <Geert.Josten@xxxxxxxxxxx>
Date: Mon, 29 Nov 2004 17:11:19 +0100
You will have to elaborate. I don't see why named entity references 'have' to be of SDATA type..

That isn't what I said. Clearly entity references don't have to be SDATA.


entities are always named, so "named entity references" are the same as
"entity references" and can be of any type allowed in the language under
consideration (XML or SGML).

Did I say 'named entity references'? That's silly, I meant 'named character references' ofcourse.. :-)


SGML, but not XML, had a specific SDATA entity type that was used for
producing something that was essentially a named character, &alpha; in HTML doesn't really expand to anything: it's just reported by the parser
as a named reference to a system specific character called alpha.

Why can't CDATA entities be used for this? These are allowed in both XML and SGML. That is what I meant to ask (first line of this message). How else are XHTML entities/named characters defined in all those XHTML DTD's of W3C?


In XHTML it's completely different: &alpha expands to the unicode
character 945 and will just be reported as that by the XML parser with
the fact that there was an entity reference perhaps not being reported
at all (and being ignored even if reported when building an Xpath data
model).

You mean, the named character references (entities) are resolved to predefined unicode characters, just as if one were reading XHTML with an XML parser using an XHTML DTD which defines these entities as such. It could be that browsers resolve the entities _after_ reading the data (contrary to XML Parsers usually do), but then again, I'm not a browser expert. They _could_ do it just like XML Parsers...


While it's not uncommon for people to use the terminology appropriate to
HTML SDATA references when talking about XML entity references, it's a
practice best avoided. &alpha; and &#945; are different beasts and
expanded at different times by an XML parser. The XML spec calls the
first an entity reference and the second a character reference
one may argue whether another terminology would have been clearer, but
what's done is done and so using "character reference" even prefixed
with "named" to refer to something that by definition is not a character
reference just helps deepen confusion not lessen it.

To put your words into mine: I am substituting a confusing but 'correct' terminology by a clearer but incorrect terminology. I got your point, but am still not more convinced than before...


:-P

Grtz

Current Thread