Subject: Understanding character handling From: Paul Prescod <paul@xxxxxxxxxxx> Date: Thu, 07 Jan 1999 14:27:09 -0600 |
The root of this problem is the lack of a data model in the XML specification. Those of us with SGML background understand the data model implicitly, and many others have picked it up, but some obviously have not yet. Some of us pushed very hard for a data model in the XML specification. Instead it appears in the DOM *and* XSL *and* XPointer *and* ... I will use Python/OQL syntax to try and explain this in terms of the XSL data model. (there is no syntax for the data model) Consider the parsing process. It builds a grove from text: < -> DataChar( "<" ) <![CDATA[<]]> -> DataChar( "<" ) <![CDATA[&]]> -> DataChar( "&" ) <![CDATA[&foo;]]> -> [DataChar( "&" ),DataChar( "f" ), DataChar( "o" ... <FOO>a...</FOO> -> Element( gi = "FOO", content=[DataChar( "a" ), ... ] ) Now consider the serialization of XML. It builds text from a grove. But it has many options because there are many equivalent serializations for a given character: DataChar( "<" ) -> < -> <![CDATA[<]]> -> < -> < Element( gi = "FOO", content=[DataChar( "a" ), ... ] ) -> <FOO>a...</FOO> Now consider the (logically identical) XSL templates: <FOO><![CDATA[&foo;]]></FOO> <FOO>&foo;</FOO> When the stylesheet is parsed either one becomes: Element( gi = "FOO", content = [DataChar( "&" ),DataChar( "f" ), DataChar( "o" ...] ) The encoding of the ampersand is irrelevant. Now this is a literal result element, with literal text within it. So it is copied to the output tree like this: Element( gi = "FOO", content = [DataChar( "&" ),DataChar( "f" ), DataChar( "o" ...] ) In other words, it is identical. Now if you go back to the serialization model above, you'll see that the correct serialization for this *as an XML file* is: <FOO>abc</FOO> Get it? The reason this is tricky is: a) there are about four steps between the input and the output b) XSL's syntax tricks you into thinking you are working with strings when you are really working with trees c) The data model is expressed in the wrong place d) There is no syntax for talking about the data model (other than Python/OQL) Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "You have the wrong number." "Eh? Isn't that the Odeon?" "No, this is the Great Theater of Life. Admission is free, but the taxation is mortal. You come when you can, and leave when you must. The show is continuous. Good-night." -- Robertson Davies, "The Cunning Man" XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: More entity confusion and my op, David Carlisle | Thread | RE: More entity confusion and my op, Pawson, David |
RE: Accessing values from another s, David Schach | Date | Re: writing entity references., dcl |
Month |