Subject: Re: [xsl] Fw: Select entire XML doc [FURTHER] From: "Karl Stubsjoen" <karl@xxxxxxxxxxxxx> Date: Fri, 28 Feb 2003 15:15:10 -0700 |
Wow... that was "overwhelmingly" excellent. Karl Errr... I think I shall learn how to post XML from the client using javascript and the XML dom ; ) Karl ----- Original Message ----- From: "Mike Brown" <mike@xxxxxxxx> To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Sent: Friday, February 28, 2003 2:56 PM Subject: Re: [xsl] Fw: Select entire XML doc [FURTHER] > Karl Stubsjoen wrote: > > Wow... that was most awesome. Thanks for the help, it really made a lot of > > sense. And indeed, I do need to be careful of HTML tags becoming malformed. > > Once the XML has been propery serialized in a text area element, what is the > > proper way to deserialize it? > > Do you mean you want to turn > > <someXmlData><tag>chardata</tag></someXmlData> > > into > > <someXmlData><tag>charadata</tag></someXmlData> > > ? > > ...This is a FAQ and is generally beyond the scope of what XML should be used > for, or what XSLT can do without extension functions. But if you insist, you > will need to write an extension function that takes the content of the > someXmlData element (or any string, really), passes it into an XML parser, and > converts the parser's results to a node-set or result tree fragment. See your > XSLT processor docs for how to write an extension function (it varies). Your > processor may already have such a function available (but likely not). > > Or do you mean after the HTML has been rendered in the browser, and the user > submits the form having the textarea with the possibly-edited XML? That's a > whole 'nother can of worms, due to encoding issues, which I am all too happy > to write about, although it is technically off-topic for this list. > > First, in general, you should not be passing XML around in HTML form data, if > the intent is to have a general-purpose XML editing system, although as long > as you stick to pure ASCII, or just treat it as an uneditable binary file, > then things should be fine. > > The problems begin with how form data is handled. A browser transmits the form > data, which is Unicode, encoded as if it were going into a URL. This means > that certain characters in the ASCII range (code points 0 to 127) and all > characters beyond the ASCII range (code points 128 to 1114111) are first > encoded as bytes, then represented as ASCII bytes for the characters "%xx" > where xx is the hexadecimal representation for a byte. The ASCII-range > characters always use the us-ascii encoding as the basis for the %-escaping, > while the non-ASCII characters typically (it's not enforced by any standard) > use the encoding *of the HTML document containing the form from which this > data was submitted*. > > So for example if you have in your textarea the character data "¡Hola amigo!", > and the HTML with the form was utf-8 encoded, and the browser user didn't > override the interpreted encoding on their end, then the form will be > submitted using utf-8 as the basis for the %-escaped form data: > > %C2%81Hola%20amigo! > > whereas if the HTML were iso-8859-1 encoded, it would be coming through as > > %81Hola%20amigo! > > On the receiving end, the form data needs to be decoded. Most servers provide > an API for receiving decoded form data in your application, be it CGI > environment variables or getParameter() methods on HTTP request objects or > what have you. But since most browsers do not communicate the details of what > encoding they used as the basis for the %-escaping, the server makes a guess, > and usually guesses wrong. So for example, while > > %C2%81Hola%20amigo! > > unambigously means bytes > > C2 81 48 6F 6C 61 20 61 6D 69 67 6F 21 > > ...the API might mistakenly assume that these are iso-8859-1 and will decode > it for you into the string "À¡Hola amigo!". In fact, this happens quite often. > So you'll have to be prepared to transcode: re-encode the string using the > same encoding that the server assumed, and then decode it using the encoding > that you know the HTML form used (you might send the latter in a hidden form > field). Either that, or pull the raw data out of the HTTP request and properly > decode it yourself. > > Once you have the properly decoded string, you can feed it to an XML parser as > a Unicode string, so that the parser will ignore the encoding declaration in > the XML's prolog. If you were to feed the raw bytes (the C2 81 48 etc above) > to the parser, you would have to declare the encoding externally, because > there's a chance that the declaration in the prolog has become innacurate > while it was edited and reencoded. > > You didn't know what you were getting into, did you? Like I said, in general, > HTML forms and the server-side APIs for processing them are just not equipped > to be a general-purpose XML editing system, at least not in an idiot-proof > way. The culprits are really HTTP and MIME; HTML is just working around their > restrictions. And browser vendors choose the path of least disruption, > choosing not to implement some of HTML's features that could easily work > around some of these issues (e.g., they do have a way of transmitting encoding > info, but they just don't do it, to "keep people's scripts from breaking"). > > -- > Mike J. Brown | http://skew.org/~mike/resume/ > Denver, CO, USA | http://skew.org/xml/ > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Fw: Select entire XML doc, Mike Brown | Thread | [xsl] not standart table in stylesh, Andrey Solonchuk |
Re: [xsl] XSL-FO versus PostScript, David Carlisle | Date | RE: [xsl] Template Parameter Q, XSLList |
Month |