Re: [xsl] Loosing encoding information

Subject: Re: [xsl] Loosing encoding information
From: "Jonathan Perret" <jonathan@xxxxxxxxxxxx>
Date: Wed, 20 Feb 2002 15:38:18 +0100
> Julian Reschke:
> > > Set oXml = Server.CreateObject("MSXML2.DOMDocument")
> > > Set oXsl = Server.CreateObject("MSXML2.DOMDocument")
> > >
> > > call oXml.loadXML(vXmlData)
> > > call oXsl.load(Server.MapPath(".\Stylesheets\File.xsl"))
> > >
> > > sData=oXml.transformNode(oXsl)
> > > Response.Write(sData)
> >
> >Never do that. You'll loose encoding information.
> >
> >Use
> >
> > oXml.transformNode(oXsl, Response)
> >
> >instead.

Julian certainly meant the transformNodeToObject method.

> >
> >And complain to MSDN about their faulty examples.
> >
> I have never heard of this, loosing encoding information before, and I use
> this code all over when transforming my documents. Can you (or anyone
> please explain to me what exactly I am loosing? (When I say encoding, I
> strongly presume that you don't mean the character encoding as in the
> problem I had, but in a broader way...?)

When you use the transformNode() method, MSXML ignores the
"encoding" attribute on the <xsl:output> element. The output
of the transform is always a UTF-16 string (BSTR, in COM parlance).
In theory, obtaining a UTF-16 string and converting it later
(that's what Response.Write does) to the client's expected
encoding (generally iso-8859-1) is equivalent to having the XSL
processor spit out iso-8859-1 text in the first place (because
UTF-16 can represent all Unicode characters).
In practice however, this has several negative impacts :
- the obvious memory and CPU waste;
- when using the "html" output method, MSXML inserts in the
output a META tag that describes the output's encoding. Because
the transform triggered by transformNode() always outputs to
UTF-16 (ignoring the "encoding" attribute), the META tag says
that the result is written in UTF-16 (and it is, indeed). When
later down the line, Response.Write converts the UTF-16 to
iso-8859-1, it does not know about that META tag, so the
document that the client receives is a proper iso-8859-1 encoded
html document, but with a META tag that tells the browser that
the document is in fact UTF-16 encoded.
This is very wrong, and most often breaks the browser.
- using transformNodeToObject, combined with Response.Buffer=False,
allows the result to be streamed to the client as it is generated
(using incremental processing when possible).

> Insidently, I looked up, looking for examples of the "right"
> way, and they showed (transforming on the client):
> ..
> document.write(xml.transformNode(xsl))
> Which is not exactly the same as Julian Reschke's example, but doesn't
> convert it into a string, before output, either.

This does output to a string. transformNode() always returns a UTF-16
string, so you should avoid it whenever it's practical.
I don't know if you can use transformNodeToObject on the client though.
It would probably look like this :
But the document object needs to support IStream for this to work.
If it does, this is certainly the best way to transform.

Actually, the best way to transform is to use IXSLProcessor but it
doesn't add anything on the encoding issues.


 XSL-List info and archive:

Current Thread