Re: Special entity characters in Shift-JIS XSL.

Subject: Re: Special entity characters in Shift-JIS XSL.
From: David Carlisle <davidc@xxxxxxxxx>
Date: Fri, 17 Dec 1999 10:05:07 GMT

> I think the OPPOSITE of flaky is the word I would use to describe an entity
> identification paradigm that allows the entity to remain in its encoded
> form, yet still be identified as an entity.  I think solid is more the word.

You could build a solid system on that basis, but it wouldn't be XML.

> how can it then be passed to anymore parsers expecting 7-bit ASCII
> characters?  

XML character set is _always_ unicode. If the encoding isn't the default
utf8 or utf16 not all of the character set may be directly accessed by
character data, but you can always use the &# syntax to access any
unicode character. An XML parser _has_ to treat `A' and `&#65;' in an
identical manner and report `character number 65' to the application,
whichever version was in the input file. If your application _needs_
to see `&#65;' and not `A' then it isn't an XML application (it could be
an SGML one).

> What if each of those parsers followed the spec, the first
> transforming the character into a 2-byte unicode character, leaving the
> others to see the two bytes as simply two different characters in the
> stream?

This can't happen as in a well formed XML document you _always_ know
if a multi-byte encoding is being used. Eitehr the <?xml declaration
specifies a single byte encoding such as latin 1, or a multiple byte
encoding is being used (utf 8 unless the first two bytes of the file are
the BOM, in which case it's utf-16)

David
 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread