Understanding character handling

Very slight quibble with Paul's explanation, for DOM-based implementations.

The DOM does allow parsers to record the use of CDATA. So the two cases
     <FOO><![CDATA[&foo;]]></FOO>
     <FOO>&amp;foo;</FOO>
can be distinguished in the DOM model. So the first line could be output
exactly as it was read in.

But it _doesn't_  preserve the difference between &amp; and &#38; and
&#x26;
(or whatever the numbers are). As far as the DOM (and XML?) is concerned,
these are identical, and the second line may be correctly output with any
of these.

User-created entities are a different kettle of worms. According to the DOM
spec, a validating parser _can_ retain the fact that data was obtained via
an entity reference -- but doesn't have to. In the former case, you know
which entity was used and you can recreate the reference on output. In the
latter approach, that information is lost. If this is important to you, use
it as a guide in selecting parsers. (I'm not defending the fact that it was
left open, just pointing out the hazard.)


______________________________________
Joe Kesselman  / IBM Research
Unless stated otherwise, all opinions are solely those of the author.



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread
Understanding character handling keshlam - Fri, 8 Jan 1999 11:40:53 -0500 <= Keith Visco - Fri, 08 Jan 1999 12:43:48 -0500 Andrew Bunner - Fri, 08 Jan 1999 10:39:48 -0800 David Carlisle - Fri, 8 Jan 1999 22:54:12 GMT David Carlisle - Fri, 8 Jan 1999 17:46:44 GMT

<- Previous	Index	Next ->
How I can get an ancestor ...., Éric Riblair	Thread	Re: Understanding character handlin, Keith Visco
An XML Namespaces Alternative..., keshlam	Date	Loops within IE5b2 Scripts, Martin Bryan
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home