Understanding character handling

Subject: Understanding character handling
From: keshlam@xxxxxxxxxx
Date: Fri, 8 Jan 1999 11:40:53 -0500
Very slight quibble with Paul's explanation, for DOM-based implementations.

The DOM does allow parsers to record the use of CDATA. So the two cases
     <FOO><![CDATA[&foo;]]></FOO>
     <FOO>&amp;foo;</FOO>
can be distinguished in the DOM model. So the first line could be output
exactly as it was read in.

But it _doesn't_  preserve the difference between &amp; and &#38; and
&#x26;
(or whatever the numbers are). As far as the DOM (and XML?) is concerned,
these are identical, and the second line may be correctly output with any
of these.

User-created entities are a different kettle of worms. According to the DOM
spec, a validating parser _can_ retain the fact that data was obtained via
an entity reference -- but doesn't have to. In the former case, you know
which entity was used and you can recreate the reference on output. In the
latter approach, that information is lost. If this is important to you, use
it as a guide in selecting parsers. (I'm not defending the fact that it was
left open, just pointing out the hazard.)


______________________________________
Joe Kesselman  / IBM Research
Unless stated otherwise, all opinions are solely those of the author.



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread