Subject: Re: [xsl] Problem with Chinese (Solution) From: "Michael Beddow" <mbnospam@xxxxxxxxxxx> Date: Wed, 8 Aug 2001 08:36:32 +0100 |
Glad to see your problem was solved, Shaun, and your posting a fully summary is much appreciated, but there are a few points in your explanation of how the solution worked that need comment: > This works great for standard encodings, but it will never work for > encodings like Chinese (GB2312). > If by "standard encodings" you mean utf-8 or us-ascii, you're right, but only because the encodings for the abstract characters common to both happen to be indistinguishable, so in the absence of a different encoding declaration the parser assumes the default utf-8 and all is well. You would also get away with it if your encoding was ISO-8859-1 and happened not to contain any actual characters outside the subrange that overlaps with us-ascii, but that would be sheer luck. For your "encodings like.." you need to substitute "any encoding other than the default", which would include, say, ISO-8859-1 containing accented characters. Such encodings must be appropriately declared in your input and output xml otherwise the parse will fail (and of course you also have to load the data as xml where there is a specific call for that purpose). > > However !!! I did notice one interesting undesirable "feature" in the > MSXML. If you put in some <HEAD></HEAD> tags into the > above XSL, then your output HTML contains the following by > magic. > > <head> > <META http-equiv="Content-Type" content="text/html; charset=UTF-16"> > </head> ISTR that this has been touched on here before, but since I'm not an intensive MSXML user I can't be sure. Rogue reversions to UTF-16 did occur with some earlier MS xml handling, but I thought that was now fixed. Do you still get this if you specify the correct output encoding attribute in an xsl:output element in your XSL? If so, what happens if you also explicitly generate an HTML HEAD that includes a META tag with the correct charset declaration? Does this still produce a charset value of UTF-16? If so, that would indeed be a bug, though I'm not convinced that the behaviour as you've described it is one. > Maybe this is what MSXML thinks the closest thing to GB2312 is. Surely no one at Microsoft could be daft enough to think that. > The bad thing is that the IE5.5 browser doesn't know how to Auto-Select > the GB2312 encoding when this is present. This might be considered a > bug. Well, one can argue about the wisdom of including an auto-select feature at all, and question the heuristics by which IE5's auto-select operates, but what's happening here is that IE5.5 sees that the page author has gone to the bother of declaring a charset in a META tag and so believes what that tag says. That seems to me a defensible choice on the part of the coders, and again not really a bug. Michael --------------------------------------------------------- Michael Beddow http://www.mbeddow.net/ XML and the Humanities page: http://xml.lexilog.org.uk/ --------------------------------------------------------- XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Problem with Chinese (Solutio, Shaun Bliss | Thread | RE: [xsl] Problem with Chinese (Sol, Julian Reschke |
Re: [xsl] XPath attribute namespace, Oliver Becker | Date | RE: [xsl] Problem with Chinese (Sol, Julian Reschke |
Month |