RE: [xsl] 8bit ascii encoding

Subject: RE: [xsl] 8bit ascii encoding
From: "Andrew Welch" <awelch@xxxxxxxxxxxxxxx>
Date: Fri, 23 Aug 2002 13:53:32 +0100
ha! no wonder I get confused...

> If each char (in uniocde  2) is in 2 bytes you are using utf-16 not
> utf-8. (Unicode 3 requires more than 2 bytes per character even in
> utf-16, the so called surrogate pairs). utf-8 requires 1 - 5 bytes,
> depending on the character.

If my chars are two bytes each then Im using utf-16, but utf-8 can
consist of 1-5bytes per char... I think I need to read some more.

At the moment, Im using an xml output method with ascii encoding, and
telling IE the encoding is utf-8 (in the meta), therefore any chars not
in ascii should be output as references and displayed correctly in IE as
that is set to UTF-8.

Currently, this results in any chars not in the ascii range being
displayed a single square box, which is progress from before where I was
getting between 3 and 7 chars displayed for any 'special' character...

Anyway, this is getting slightly off-topic and I think Im fighting a
losing battle as anything I do has to go through the ActiveX control,
which I haven't got control of (or any understanding of ;) so I'll call
it a day for now.

Thanks for the continuing education in character encoding - one day I
will get it!

cheers
andrew 



> -----Original Message-----
> From: David Carlisle [mailto:davidc@xxxxxxxxx]
> Sent: 23 August 2002 12:33
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] 8bit ascii encoding
> 
> 
> 
> > Yeah... anywhere nice?
> 
> I would say that it was suitably far from computers, but it seems that
> even 3000m up a swiss mountain you still expect to find an 
> internet cafe
> these days (I resisted the urge to log in and answer any xsl-list
> messages though:-)
> 
> > ha.. nice.  After some testing it seems that char references display
> > fine, while characters themselves do not 
> 
> well presumably they would if you wrote the characters in the right
> encoding. Guessing it sounds like you are writing bytes that 
> correspond
> to iso-8859-1 characters into a utf8 encoded stream. If so you'll get
> the wrong characters (or more often an error) except for that part of
> utf-8 that happens to use one byte per character.
> 
> > I think the reason IE isn't picking up that each char is two
> > bytes (utf-8)
> 
> If each char (in uniocde  2) is in 2 bytes you are using utf-16 not
> utf-8. (Unicode 3 requires more than 2 bytes per character even in
> utf-16, the so called surrogate pairs). utf-8 requires 1 - 5 bytes,
> depending on the character.
> 
> 
> > So I guess I have two options...
> > 
> > 1. persevere trying to get IE to treat the output as two byte chars 
> 
> I think your problem is using the phrase "two byte chars" 
> which leads to
> confusion. Characters have a unicode number but do not correspond
> directly to any number of bytes.
> Different encodings map subsets of the unicode character set into
> particular byte combinations.
> 
> 
> > 2. pass through all char refs to the output un-escaped, and let IE
> > escape them...
> 
> All character references are replaced by the referenced 
> character by an
> XML parser. So ther eis no way to "pass through" references unchanged.
> The XSLT system can not tell whether a reference or a character was in
> the original data.
> 
> 
> > Is this the best option?
> It is still not clear what you are trying to do but there should be bo
> real reason why your C part can not handle whatever encoding is coming
> out of the XSLT. It isn't clear from your description whether this is
> utf-8 or utf-16. You may find it easier if you specified
> encoding="iso-8859-1" and used latin-1 in the C part.
> 
> David
> 
> _____________________________________________________________________
> This message has been checked for all known viruses by Star Internet
> delivered through the MessageLabs Virus Scanning Service. For further
> information visit http://www.star.net.uk/stats.asp or 
> alternatively call
> Star Internet for details on the Virus Scanning Service.
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 
> 
> 
> 
> 
> ---
> Incoming mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.381 / Virus Database: 214 - Release Date: 02/08/2002
>  
> 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.381 / Virus Database: 214 - Release Date: 02/08/2002
 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread