Subject: Re: [xsl] xslt replace special characters From: "Alice Fan" <arisuu@xxxxxxxxxxx> Date: Mon, 11 Nov 2002 12:53:19 -0800 |
From: Mike Brown <mike@xxxxxxxx> Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: [xsl] xslt replace special characters Date: Mon, 11 Nov 2002 13:38:52 -0700 (MST)
Alice Fan wrote:
> Thanks Greg. Right in the UI, we want the user to enter their URL. Their
> URL will most likely have name/value pairs. Is there an easier way? There
> is no otherway of filtering '&' before it gets processed in the XSL?
It doesn't matter if they're entering a URL/URI or not. Any text that you intend to put into an XML document needs to be screened, to preserve well-formedness / parseability.
1. Always note the following:
- non-XML characters need to be removed or replaced (U+0000..U+0008, U+000B, U+000C, U+000E..U+001F, U+D800..U+DFFF, U+FFFE..U+FFFF)
- a string is not a URI if it violates URI syntax, so if the text is destined for a URI-pseudotype attribute value (like href or src in HTML/XHTML), characters above U+007F should be escaped by writing their equivalent UTF-8 bytes as '%xx' for each byte, where xx is the hex notation for the byte (though this isn't strictly necessary; a conforming HTML user agent will do this automatically)
- additional translation of ASCII-range characters (U+0000..U+007F) in text destined for URI attributes is not required but is wise, to ensure conformance to URI syntax; %-escape everything except a-z, A-Z, 0-9, and these: - _ . ! ~ * ' ( ) ; / ? : @ & = + $ , [ ]
2. If and when the XML document exists in serialized form (i.e., as a string, not as a DOM object), note the following:
- if the text is not destined for a CDATA section, markup characters '&' and '<' need to be escaped
- if the text is destined for a CDATA section, the '>' in ']]>' needs to be escaped
- if the text is destined for a comment, it must not contain '--' (how you handle such an offense is up to you)
- if the text is destined for an attribute value delimited by apostrophes,
then apostrophes in the value must be escaped (usually use ' unless
in HTML)
- if the text is destined for an attribute value delimited by quotes, then quotes in the value must be escaped (usually use ")
- if the text is destined for a non-URI attribute value, then tab, LF, and CR need to be escaped to facilitate round-tripping
I probably missed one or two cases, but as you can see, you can't just slap any old text into a document and call it XML...
- Mike ____________________________________________________________________________ mike j. brown | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
.......................................................... Email: arisuu@xxxxxxxxxxx Cell: (650) 483-8164 Work: (212) 201-0881 ..........................................................Thank
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] xslt replace special char, Alice Fan | Thread | RE: [xsl] xslt replace special char, Passin, Tom |
RE: [xsl] xslt replace special char, Alice Fan | Date | Re: [xsl] cocoon + relative path, Joerg Heinicke |
Month |