Re: [xsl] 8bit ascii encoding

Subject: Re: [xsl] 8bit ascii encoding
From: David Carlisle <davidc@xxxxxxxxx>
Date: Fri, 23 Aug 2002 12:33:04 +0100
> Yeah... anywhere nice?

I would say that it was suitably far from computers, but it seems that
even 3000m up a swiss mountain you still expect to find an internet cafe
these days (I resisted the urge to log in and answer any xsl-list
messages though:-)

> ha.. nice.  After some testing it seems that char references display
> fine, while characters themselves do not 

well presumably they would if you wrote the characters in the right
encoding. Guessing it sounds like you are writing bytes that correspond
to iso-8859-1 characters into a utf8 encoded stream. If so you'll get
the wrong characters (or more often an error) except for that part of
utf-8 that happens to use one byte per character.

> I think the reason IE isn't picking up that each char is two
> bytes (utf-8)

If each char (in uniocde  2) is in 2 bytes you are using utf-16 not
utf-8. (Unicode 3 requires more than 2 bytes per character even in
utf-16, the so called surrogate pairs). utf-8 requires 1 - 5 bytes,
depending on the character.

> So I guess I have two options...
> 1. persevere trying to get IE to treat the output as two byte chars 

I think your problem is using the phrase "two byte chars" which leads to
confusion. Characters have a unicode number but do not correspond
directly to any number of bytes.
Different encodings map subsets of the unicode character set into
particular byte combinations.

> 2. pass through all char refs to the output un-escaped, and let IE
> escape them...

All character references are replaced by the referenced character by an
XML parser. So ther eis no way to "pass through" references unchanged.
The XSLT system can not tell whether a reference or a character was in
the original data.

> Is this the best option?
It is still not clear what you are trying to do but there should be bo
real reason why your C part can not handle whatever encoding is coming
out of the XSLT. It isn't clear from your description whether this is
utf-8 or utf-16. You may find it easier if you specified
encoding="iso-8859-1" and used latin-1 in the C part.


This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit or alternatively call
Star Internet for details on the Virus Scanning Service.

 XSL-List info and archive:

Current Thread