Subject: RE: [xsl] 8bit ascii encoding From: "Andrew Welch" <awelch@xxxxxxxxxxxxxxx> Date: Fri, 23 Aug 2002 13:53:32 +0100 |
ha! no wonder I get confused... > If each char (in uniocde 2) is in 2 bytes you are using utf-16 not > utf-8. (Unicode 3 requires more than 2 bytes per character even in > utf-16, the so called surrogate pairs). utf-8 requires 1 - 5 bytes, > depending on the character. If my chars are two bytes each then Im using utf-16, but utf-8 can consist of 1-5bytes per char... I think I need to read some more. At the moment, Im using an xml output method with ascii encoding, and telling IE the encoding is utf-8 (in the meta), therefore any chars not in ascii should be output as references and displayed correctly in IE as that is set to UTF-8. Currently, this results in any chars not in the ascii range being displayed a single square box, which is progress from before where I was getting between 3 and 7 chars displayed for any 'special' character... Anyway, this is getting slightly off-topic and I think Im fighting a losing battle as anything I do has to go through the ActiveX control, which I haven't got control of (or any understanding of ;) so I'll call it a day for now. Thanks for the continuing education in character encoding - one day I will get it! cheers andrew > -----Original Message----- > From: David Carlisle [mailto:davidc@xxxxxxxxx] > Sent: 23 August 2002 12:33 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: Re: [xsl] 8bit ascii encoding > > > > > Yeah... anywhere nice? > > I would say that it was suitably far from computers, but it seems that > even 3000m up a swiss mountain you still expect to find an > internet cafe > these days (I resisted the urge to log in and answer any xsl-list > messages though:-) > > > ha.. nice. After some testing it seems that char references display > > fine, while characters themselves do not > > well presumably they would if you wrote the characters in the right > encoding. Guessing it sounds like you are writing bytes that > correspond > to iso-8859-1 characters into a utf8 encoded stream. If so you'll get > the wrong characters (or more often an error) except for that part of > utf-8 that happens to use one byte per character. > > > I think the reason IE isn't picking up that each char is two > > bytes (utf-8) > > If each char (in uniocde 2) is in 2 bytes you are using utf-16 not > utf-8. (Unicode 3 requires more than 2 bytes per character even in > utf-16, the so called surrogate pairs). utf-8 requires 1 - 5 bytes, > depending on the character. > > > > So I guess I have two options... > > > > 1. persevere trying to get IE to treat the output as two byte chars > > I think your problem is using the phrase "two byte chars" > which leads to > confusion. Characters have a unicode number but do not correspond > directly to any number of bytes. > Different encodings map subsets of the unicode character set into > particular byte combinations. > > > > 2. pass through all char refs to the output un-escaped, and let IE > > escape them... > > All character references are replaced by the referenced > character by an > XML parser. So ther eis no way to "pass through" references unchanged. > The XSLT system can not tell whether a reference or a character was in > the original data. > > > > Is this the best option? > It is still not clear what you are trying to do but there should be bo > real reason why your C part can not handle whatever encoding is coming > out of the XSLT. It isn't clear from your description whether this is > utf-8 or utf-16. You may find it easier if you specified > encoding="iso-8859-1" and used latin-1 in the C part. > > David > > _____________________________________________________________________ > This message has been checked for all known viruses by Star Internet > delivered through the MessageLabs Virus Scanning Service. For further > information visit http://www.star.net.uk/stats.asp or > alternatively call > Star Internet for details on the Virus Scanning Service. > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > > > > > > --- > Incoming mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.381 / Virus Database: 214 - Release Date: 02/08/2002 > > --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.381 / Virus Database: 214 - Release Date: 02/08/2002 XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] 8bit ascii encoding, Thomas B. Passin | Thread | Re: [xsl] 8bit ascii encoding, David Carlisle |
[xsl] How to match a child element , Biying Huang | Date | [xsl] Sort on multiple elements, Jitu |
Month |