Re: [xsl] Fw: Select entire XML doc [FURTHER]

Subject: Re: [xsl] Fw: Select entire XML doc [FURTHER]
From: "Karl Stubsjoen" <karl@xxxxxxxxxxxxx>
Date: Fri, 28 Feb 2003 15:15:10 -0700
Wow... that was "overwhelmingly" excellent.

Errr... I think I shall learn how to post XML from the client using
javascript and the XML dom ; )


----- Original Message -----
From: "Mike Brown" <mike@xxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Friday, February 28, 2003 2:56 PM
Subject: Re: [xsl] Fw: Select entire XML doc [FURTHER]

> Karl Stubsjoen wrote:
> > Wow... that was most awesome.  Thanks for the help, it really made a lot
> > sense.  And indeed, I do need to be careful of HTML tags becoming
> > Once the XML has been propery serialized in a text area element, what is
> > proper way to deserialize it?
> Do you mean you want to turn
> <someXmlData>&lt;tag&gt;chardata&lt;/tag&gt;</someXmlData>
> into
> <someXmlData><tag>charadata</tag></someXmlData>
> ?
> ...This is a FAQ and is generally beyond the scope of what XML should be
> for, or what XSLT can do without extension functions. But if you insist,
> will need to write an extension function that takes the content of the
> someXmlData element (or any string, really), passes it into an XML parser,
> converts the parser's results to a node-set or result tree fragment. See
> XSLT processor docs for how to write an extension function (it varies).
> processor may already have such a function available (but likely not).
> Or do you mean after the HTML has been rendered in the browser, and the
> submits the form having the textarea with the possibly-edited XML? That's
> whole 'nother can of worms, due to encoding issues, which I am all too
> to write about, although it is technically off-topic for this list.
> First, in general, you should not be passing XML around in HTML form data,
> the intent is to have a general-purpose XML editing system, although as
> as you stick to pure ASCII, or just treat it as an uneditable binary file,
> then things should be fine.
> The problems begin with how form data is handled. A browser transmits the
> data, which is Unicode, encoded as if it were going into a URL. This means
> that certain characters in the ASCII range (code points 0 to 127) and all
> characters beyond the ASCII range (code points 128 to 1114111) are first
> encoded as bytes, then represented as ASCII bytes for the characters "%xx"
> where xx is the hexadecimal representation for a byte. The ASCII-range
> characters always use the us-ascii encoding as the basis for the
> while the non-ASCII characters typically (it's not enforced by any
> use the encoding *of the HTML document containing the form from which this
> data was submitted*.
> So for example if you have in your textarea the character data "¡Hola
> and the HTML with the form was utf-8 encoded, and the browser user didn't
> override the interpreted encoding on their end, then the form will be
> submitted using utf-8 as the basis for the %-escaped form data:
>   %C2%81Hola%20amigo!
> whereas if the HTML were iso-8859-1 encoded, it would be coming through as
>   %81Hola%20amigo!
> On the receiving end, the form data needs to be decoded. Most servers
> an API for receiving decoded form data in your application, be it CGI
> environment variables or getParameter() methods on HTTP request objects or
> what have you. But since most browsers do not communicate the details of
> encoding they used as the basis for the %-escaping, the server makes a
> and usually guesses wrong. So for example, while
>    %C2%81Hola%20amigo!
> unambigously means bytes
>    C2 81 48 6F 6C 61 20 61 6D 69 67 6F 21
> ...the API might mistakenly assume that these are iso-8859-1 and will
> it for you into the string "À¡Hola amigo!". In fact, this happens quite
> So you'll have to be prepared to transcode: re-encode the string using the
> same encoding that the server assumed, and then decode it using the
> that you know the HTML form used (you might send the latter in a hidden
> field). Either that, or pull the raw data out of the HTTP request and
> decode it yourself.
> Once you have the properly decoded string, you can feed it to an XML
parser as
> a Unicode string, so that the parser will ignore the encoding declaration
> the XML's prolog. If you were to feed the raw bytes (the C2 81 48 etc
> to the parser, you would have to declare the encoding externally, because
> there's a chance that the declaration in the prolog has become innacurate
> while it was edited and reencoded.
> You didn't know what you were getting into, did you? Like I said, in
> HTML forms and the server-side APIs for processing them are just not
> to be a general-purpose XML editing system, at least not in an idiot-proof
> way. The culprits are really HTTP and MIME; HTML is just working around
> restrictions. And browser vendors choose the path of least disruption,
> choosing not to implement some of HTML's features that could easily work
> around some of these issues (e.g., they do have a way of transmitting
> info, but they just don't do it, to "keep people's scripts from
> --
>   Mike J. Brown   |
>   Denver, CO, USA |
>  XSL-List info and archive:

 XSL-List info and archive:

Current Thread