Re: [xsl] Fw: Select entire XML doc [FURTHER]

Subject: Re: [xsl] Fw: Select entire XML doc [FURTHER]
From: "Karl Stubsjoen" <karl@xxxxxxxxxxxxx>
Date: Fri, 28 Feb 2003 15:15:10 -0700
Wow... that was "overwhelmingly" excellent.
Karl

Errr... I think I shall learn how to post XML from the client using
javascript and the XML dom ; )

Karl


----- Original Message -----
From: "Mike Brown" <mike@xxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Friday, February 28, 2003 2:56 PM
Subject: Re: [xsl] Fw: Select entire XML doc [FURTHER]


> Karl Stubsjoen wrote:
> > Wow... that was most awesome.  Thanks for the help, it really made a lot
of
> > sense.  And indeed, I do need to be careful of HTML tags becoming
malformed.
> > Once the XML has been propery serialized in a text area element, what is
the
> > proper way to deserialize it?
>
> Do you mean you want to turn
>
> <someXmlData>&lt;tag&gt;chardata&lt;/tag&gt;</someXmlData>
>
> into
>
> <someXmlData><tag>charadata</tag></someXmlData>
>
> ?
>
> ...This is a FAQ and is generally beyond the scope of what XML should be
used
> for, or what XSLT can do without extension functions. But if you insist,
you
> will need to write an extension function that takes the content of the
> someXmlData element (or any string, really), passes it into an XML parser,
and
> converts the parser's results to a node-set or result tree fragment. See
your
> XSLT processor docs for how to write an extension function (it varies).
Your
> processor may already have such a function available (but likely not).
>
> Or do you mean after the HTML has been rendered in the browser, and the
user
> submits the form having the textarea with the possibly-edited XML? That's
a
> whole 'nother can of worms, due to encoding issues, which I am all too
happy
> to write about, although it is technically off-topic for this list.
>
> First, in general, you should not be passing XML around in HTML form data,
if
> the intent is to have a general-purpose XML editing system, although as
long
> as you stick to pure ASCII, or just treat it as an uneditable binary file,
> then things should be fine.
>
> The problems begin with how form data is handled. A browser transmits the
form
> data, which is Unicode, encoded as if it were going into a URL. This means
> that certain characters in the ASCII range (code points 0 to 127) and all
> characters beyond the ASCII range (code points 128 to 1114111) are first
> encoded as bytes, then represented as ASCII bytes for the characters "%xx"
> where xx is the hexadecimal representation for a byte. The ASCII-range
> characters always use the us-ascii encoding as the basis for the
%-escaping,
> while the non-ASCII characters typically (it's not enforced by any
standard)
> use the encoding *of the HTML document containing the form from which this
> data was submitted*.
>
> So for example if you have in your textarea the character data "¡Hola
amigo!",
> and the HTML with the form was utf-8 encoded, and the browser user didn't
> override the interpreted encoding on their end, then the form will be
> submitted using utf-8 as the basis for the %-escaped form data:
>
>   %C2%81Hola%20amigo!
>
> whereas if the HTML were iso-8859-1 encoded, it would be coming through as
>
>   %81Hola%20amigo!
>
> On the receiving end, the form data needs to be decoded. Most servers
provide
> an API for receiving decoded form data in your application, be it CGI
> environment variables or getParameter() methods on HTTP request objects or
> what have you. But since most browsers do not communicate the details of
what
> encoding they used as the basis for the %-escaping, the server makes a
guess,
> and usually guesses wrong. So for example, while
>
>    %C2%81Hola%20amigo!
>
> unambigously means bytes
>
>    C2 81 48 6F 6C 61 20 61 6D 69 67 6F 21
>
> ...the API might mistakenly assume that these are iso-8859-1 and will
decode
> it for you into the string "À¡Hola amigo!". In fact, this happens quite
often.
> So you'll have to be prepared to transcode: re-encode the string using the
> same encoding that the server assumed, and then decode it using the
encoding
> that you know the HTML form used (you might send the latter in a hidden
form
> field). Either that, or pull the raw data out of the HTTP request and
properly
> decode it yourself.
>
> Once you have the properly decoded string, you can feed it to an XML
parser as
> a Unicode string, so that the parser will ignore the encoding declaration
in
> the XML's prolog. If you were to feed the raw bytes (the C2 81 48 etc
above)
> to the parser, you would have to declare the encoding externally, because
> there's a chance that the declaration in the prolog has become innacurate
> while it was edited and reencoded.
>
> You didn't know what you were getting into, did you? Like I said, in
general,
> HTML forms and the server-side APIs for processing them are just not
equipped
> to be a general-purpose XML editing system, at least not in an idiot-proof
> way. The culprits are really HTTP and MIME; HTML is just working around
their
> restrictions. And browser vendors choose the path of least disruption,
> choosing not to implement some of HTML's features that could easily work
> around some of these issues (e.g., they do have a way of transmitting
encoding
> info, but they just don't do it, to "keep people's scripts from
breaking").
>
> --
>   Mike J. Brown   |  http://skew.org/~mike/resume/
>   Denver, CO, USA |  http://skew.org/xml/
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread